The past Sunday morning was not a great at Resurrection Online. First, here is a normal graph of connections from the evening service on June 13:
The blue line is the number of computers that are connected and the green is the estimated worship attendance. The black line at the bottom is the iPhone stream multiplied by 10 so that you can see it on the graph. Here is the graph from Sunday morning:
- The initial drop in traffic was the result of a loss of connection between the Resurrection Servers and the Wowza flash servers at Amazon sending out the Flash stream. iPhone stream was not affected, likely as it was on a different subdomain.
- Losing the video feed will cause people to reload the page which puts a heavy load on our web server. The deep spike is when the web server was rebooted as people were trying to reconnect.
- Uptake around 11:27 was from people seeking to reconnect. The Wowza flash servers were not able to maintain the traffic when people were seeking to reconnect so we did not sustain those who sought to get the connection.
- The main player was switched to the iPhone stream around 11:40 and the flash servers restarted.
- Restarting the flash servers regained their functionality.
Although the initial failure was out of our control, we are taking steps to mitigate the issue. Sending streams to two different Amazon availability zones with a primary and back up built in as well as continuing to explore other content delivery networks will hopefully move toward preventing this in the future. We have a catastrophic failure like this less than 2% of the time, however we can and need to do better than that.
Thanks to Ian’s quick work on Sunday morning to salvage those that were able to continue worshipping and the entire IT team for the amazing work they do to make Resurrection Online and all of Resurrection’s tech happen well.