|al(j)ʊˈmɪnɪəm| mass noun - the chemical element of atomic number 13, a corrosion-resistant metal named after Habib Alamin, a computer programmer

Pitfalls to avoid when writing multiprocess Ruby code

So, a couple of weeks ago, I wrote a post on how fast forking can make your MRI Ruby code. There are a few things to be aware of, though, before you go out and take advantage of this.

Obviously, your code has to be thread-safe. I can’t tell you if your Rails app is thread-safe or not, but this Bear Metal blog post might be helpful.

The first pitfall I experienced was ActiveRecord. If you fork, you must use a different connection per process. I cannot recall the exact error message that you would get if you did not use a new connection (and I no longer have access to the codebase to quickly check that, nor am I going to start up a new Rails app to check).

I was writing a short-lived process, and simply saying ActiveRecord::Base.connection.reconnect! in the fork block was fine, up until I needed to test the code with RSpec. Then, I had to change the process to disconnect from the main process, establish a new connection for each subprocess (in the fork block), and then establish another new connection in the main process, which I did after waiting for all the child processes to finish.

Speaking of waiting for child processes, the second pitfall was tricky. I used Process.waitall2 to indiscriminately wait for all the child processes. Of course, that was fine, since it was a short-lived process started from a Rake task, but once I needed to test that code, it started interacting with other code, some of which could also start their own child processes. In my case, we were using a gem called sunspot-rails-tester, which started a Sunspot process for integration tests. The problem here was two-fold.

First of all, Process.waitall2 was waiting for a never-ending child process to end. Changing that to simply Process.waitpid2 pid on each of my child process IDs fixed the problem with my test taking forever (literally) to finish until an outside force stopped them. That’s a strike against imperative-style global variables (of course, there was no variable involved here per se, but this is analogous to saying Process.waitpid2 pid on each process ID in a global list of all child process IDs across the program).

The second problem was being unable to connect to Sunspot on some of the integration tests that we already had. This one was because of an at_exit hook that sunspot-rails-tester had installed that causes the child Sunspot process to be killed once the main process had completed to avoid an orphaned process. My child processes were inheriting this, so that when they completed their task, they were causing the Sunspot process to be killed. This was fixed by creating my own at_exit hook in the fork block that simply exited with the right status code (which is important), but using Kernel#exit!, which skips at_exit hooks, instead of Kernel#exit. at_exit hooks are called in the reverse order they are defined. We need our own at_exit hook because simply calling #exit! on completion doesn’t account for exceptions.

The final thing I want to mention is that you may want signals sent to your parent process to be propagated to its children processes. This is easy to do: simply call Process.setpgrp, which “sets the process group ID of the calling process to the process ID of the calling process” (equivalent to setpgid(0, 0)). Child processes inherit the parent process’ group ID, and any signals sent to a group ID are sent to all members of the group. This effectively means the parent process’ group ID is set to its own process ID, its children inherit the group ID, and so when a signal is sent to the parent process, since the group ID is the same as the process ID, the signal is sent to the group, which is then propagated to all processes in that group (or “session”).

No doubt there’s more that I missed, so please do let us all know if I’ve left something out.