Many modern web and mobile apps rely heavily on real-time communication to provide a fast and consistent experience to users. Entire categories of applications would limp along without the ability to broadcast new data out to everybody connected. Imagine collaboration tools, live editors, sports scores, and stock tickers all stuck polling for changes.
Recently I led a team that overhauled a client side experience to handle a continuous flow of data from everybody participating in the project. Speed, consistency and reliability were critical for the new implementation's success. Naturally we turned to Pusher, but we didn't get what we were looking for.
Where an Existing Platform Let Us Down
Of course we tried the existing industry standard solution first. Configuration and setup was simple enough, but it quickly fell over with even a modest workload. What follows are only a few of the problem areas that we encountered.
Reliability
Broadcasting events would often timeout after 5 seconds. More accurately, "often" means up to 30% of the time. To combat unpredictability and latency, events were broadcast in the background and automatically retried. Automatic retries may assuage timeouts over time, but they introduce rampant race conditions. Imagine a scenario where an event that added some data fails to send but the subsequent event that removes that data sends immediately?
Inflexibility
Any payload over a seemingly arbitrary threshold of 10 kilobytes could not be delivered. It was quite common that a JSON payload included a lot of text, lengthy URLs, or numerous associations for sideloading. Engineering solutions to this problem such as compressing data or only sending a delta are possible, but neither are foolproof and introduce more complexity.
The Tribulations of Rolling Your Own
All developers are prone to bouts of NIH Syndrome. Surely our team can implement a websocket solution ourselves?
Why Not Stick With Ruby?
Websockets and MRI simply don't play well together. Support for Rack Hijack is spotty and only works with certain servers. Even with hijack support working you won't scale a threaded server like Puma up to thousands of concurrent connections. The Faye project and related libraries provide excellent tooling around websockets, but it won't work with Unicorn and provides no abstractions or instrumentation at all.
Use Another Stack Instead?
Jumping to another stack, such as Node.js or Erlang, is tricky enough by itself. On top of the issues with building out a relay you need to support additional servers, additional deployments, some sort of pub/sub or message broker. That is a lot of added complexity to distract your team from building your primary product.
Websockets enforce security policies. Yes, it is a bad idea to send insecure data from a secure client, fortunately it isn't even possible. That means the real-time server needs to handle SSL connections, adding another layer of complexity. Node isn't natively able to handle secure connections. That leaves a solution like stunnel or nginx to terminate SSL, making configuration even more complex. Additionally cross domain policies mandate a wildcard certificate or additional CORS setup.
What's Going on in There?
Without additional engineering effort all messages within the system are zipping around within a black box. There isn't any instrumentation on connections or performance. Tracking connection activity and messaging is just as important as monitoring HTTP traffic. Now it's time to get statsd involved too!
Introducing Snö
Building and maintaining your own solution is unquestionably the most expensive way to tackle the issue. The cost of a single developer (one who has worked on this exact problem before and knows precisely what to build) greatly exceeds subscription fees to an outside service for years. No suitable service or stack existed when I went through all of these steps.
That is why we're introducing Snö, a reliable platform for websites and apps that need real time messaging. Please take a look. If you like what you see sign up for the waitlist, we'll let you know how it progresses.