Subscribe to receive infrequent email updates from me with Ruby tips and gems.

Live Streaming: Rails 4 Versus Node.js

One of the questions I get from readers of my book (Upgrade to Rails 4) is about Live Streaming. Live Streaming sounds like a very exciting new feature, but quite a few people are confused about it. What are some of the applications of Live Streaming and more importantly, can it compete with Node.js?

Let’s walk through an implementation of a chatbox in both Rails and Node.js and see what we can learn about both implementations. Afterwards, we’ll run a few benchmarks to see which implementation is more efficient. I open-sourced both the Rails 4 Live Streaming Chatbox and the Node.js Chatbox implementation.

The first thing I thought when reading about Live Streaming was the following: Rails 4 can compete with evented systems such as Node.js and Erlang. Hooray! This might be true. But to what extent?


Puma, Server-Sent Events and Redis

First of all, you’ll need to use a different web server than standard WEBrick if you want to use live streaming. WEBrick buffers all output and spits out your results at once, which doesn’t make it particularly useful for streaming. Get a concurrent web server like Puma or Rainbows!. Puma should use less memory and have better concurrency than Unicorn (on which Rainbows! is built). On MRI, there is a Global Interpreter Lock (GIL) that ensures only one thread can be run at a time. But if you’re doing a lot of blocking I/O (such as HTTP calls to external APIs), Puma still improves MRI’s throughput by allowing blocking I/O to be run concurrently. Let’s see what this means for ActionController::Live, the Rails 4 Live Streaming mixin.

When we use ActionController::Live as a module in a controller, this controller will run in a separate thread, so make sure all your code is thread safe. Every message that comes in will be processed in a new, separate thread. Consider the following code:

1
2
3
4
5
6
7
8
9
10
class MyController < ActionController::Base
  include ActionController::Live

  def index
    100.times {
      response.stream.write "hello world\n"
    }
    response.stream.close
  end
end

Each time the index method is called, a new thread will be created and used to stream the message “hello world\n” down to the client. The default Puma configuration uses a maximum of 16 threads. This means that a maximum of 16 clients can be streaming data at the same time. When a 17th request comes in, Puma will block it until one of the 16 other threads are done streaming. For our chat application, this shouldn’t be a big problem since our messages are rather short. Bear in mind that due to the nature of the application, a lot of threads will have to be created and destroyed so there will be a significant amount of overhead. Moreover, note that Rails reserves a database connection for every incoming request, even if it’s not using the database. I therefore increased my database pool to 100 connections in my config/database.yml file.

1
2
3
4
5
development:
  adapter: sqlite3
  database: db/development.sqlite3
  pool: 100
  timeout: 5000

If for some reason you’re not happy with the default maximum of 16 threads, you can increase this. Note that increasing the maximum number of threads brings a lot of overhead and memory for our chatbox as well. The actual code that will be streaming chat messages to the browser is the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def events
  response.headers['Content-Type'] = 'text/event-stream'
  sse = Streamer::SSE.new(response.stream)
  redis = Redis.new
  redis.subscribe('messages.create') do |on|
    on.message do |event, data|
      sse.write(data, event: 'messages.create')
    end
  end
  render nothing: true
rescue IOError
  # Client disconnected
ensure
  redis.quit
  sse.close
end

In this controller method, I’m using a class that encapsulates Server-Sent Events (SSEs). Server-sent events is a technology for providing push notifications from a server to a browser client in the form of DOM events. Currently, not all browsers (read: IE) are supported. You need at least Chrome 9+, Firefox 6.0+, Opera 11+, Safari 5+, iOS Safari 4.0+, Blackberry, Opera Mobile, Chrome for Android or Firefox for Android. For IE, you can use the EventSource polyfill. One of the downsides of SSEs is that they do not support bi-directional communication. That’s why we are using an SSE Down/Ajax Up approach. In the browser, we send an AJAX request to the events route and the server replies with an SSE. Check out this introductory presentation to SSEs for additional information.

In the above code, we use a Redis instance to subscribe to all create message events. Each time we see such an event, we use SSE to send the message down to the client. The message itself is published to Redis in the create method.

1
2
3
4
5
6
def create
  response.headers['Content-Type'] = 'text/javascript'
  @message = params.require(:message).permit(:name, :content)
  $redis.publish('messages.create', @message.to_json)
  render nothing: true
end

Rainbows! or Puma?

Since Ruby has a few concurrent web servers, I decided to compare Puma to another web server, Rainbows!. Changing your web server is as easy as adding a new line to your Gemfile (gem 'rainbows'). You can create an additional Rainbows! configuration file to specify the number of processes and connections:

1
2
3
4
5
worker_processes 2 # assuming two CPU cores
Rainbows! do
  use :FiberSpawn
  worker_connections 100
end

Here, we use two processes, one for each CPU, with 100 worker connections. In the following section, we will look at the performance of the chatbox implementation in Rainbows!, Puma and Node.js (Express.js and Socket.io).


Benchmarks

All tests were executed on my local machine, a Macbook 2GHz Intel Core 2 Duo with 8GB 1067 MHz DDR3 memory and Mac OS 10.8.3. I used Apache Benchmark to execute 10000 requests with 100 concurrent connections. The following graph is a visualization of the response times using Rainbows! and Puma:

The benchmarks have a gist on Github for both Puma and Rainbows!. Using Puma, the 10000 requests finished after 63.415 seconds, constantly using 99.9% CPU and a memory usage of around 75M. On first sight, we immediately see from the graph that the response time using Puma at the end of the 10000 requests is pretty bad with 100 concurrent requests, with the longest request taking around 60 seconds. I’m not entirely sure why this happens or what happens near the end, but here’s one plausible explanation:

When the benchmark starts, 100 concurrent requests are sent to the web server. A maximum number of 16 threads, and thus 16 requests, are allocated by Puma at once. The 17th request will block until one of the 16 threads currently in use is finished. However, since we’re executing 100 concurrent requests, there will be 84 requests waiting (100-16). Looking at the requests in the generated puma.dat file (generated with ab -r -n 10000 -c 100 -T 'application/x-www-form-urlencoded' -g puma.dat -p ../live_streaming/post http://127.0.0.1:3000/messages), we see that exactly 84 requests have been waiting for execution. These are the requests that were issued first, but have never been allocated to a thread by Puma. As a result, they have been waiting for the entire benchmark. I’m not sure why Puma would behave like this.

When using Rainbows!, we initiate two processes, one for each CPU. The longest request takes around 1000ms and more than 66% of all requests were executed within 614ms. All 10000 requests finished after 61.802 seconds, with a memory usage of around 122M. This makes Rainbows! slightly faster in comparison to Puma. Note however that Puma has a better overall response time. Using Puma, 95% of all requests will be served within 400ms. The other percentages of the requests served within a certain time (ms) can be found in both gists.

In the graph below, I removed the requests that took over 10000ms to have a better idea how Rainbows! behaves in comparison to Puma:

As you can see, Puma has very good response times apart from the end of the benchmark. Finally, I also benchmarked a Node.js implementation that uses Express.js and WebSockets. Since WebSockets use a different protocol than HTTP, I wasn’t able to use Apache Benchmark to test the implementation. I wrote a small custom benchmark script, which can also be found on the Github repository. The results of this benchmark can be found in this gist. This benchmark tracks the number of connected users, the number of messages received per second, the number of messages sent per second, the number of messages received per second per user and the number of messages sent per second per user.

The Node.js benchmark itself works as follows. First, we connect 100 concurrent clients to the Node.js server over a timespan of 1 minute (in order not to choke my system). Next, all clients start sending messages to the Node.js server. Using Node.js, I received 1670 messages per second, which is a huge performance boost in comparison to both Rails implementations. The complete benchmark was finished in ~6 seconds, which is 10 times as fast as both Rails implementations. The node process consumes an average of 75% CPU and uses over 200M of memory. While this benchmark is not as accurate as the Rails benchmarks, we can conclude that Rails 4 Live Streaming is still no match for a simple Node.js evented app. Moreover, Node has better WebSocket support and can handle concurrency very well.

If you would like to execute these benchmarks yourself as well, don’t forget to increase the number of allowed open file handles using ulimit -n 1000.

What do you think of Rainbows! and Puma? Do you have any idea why Puma blocks certain requests untli the end? Let me know in the comments. If you like this post or you are interested in other Rails 4.0 features, you might want to buy my book.

Comments