Fork
If you’re running a heavy task that can be split up into many independent tasks in a thread-safe manner (I’m assuming you have all those tricky details worked out or you’re willing to work through them), you probably want to parallelise it.
Unfortunately, Matz’ Ruby Implementation has a GIL that prevents more than one thread from executing at a time. It abstracts the OS’ native threads away and uses a scheduling system to make sure that only one Ruby thread is running at a time (with a few exceptions that likely aren’t relevant).
Bummer!
Not to worry, though. Ruby programs can fork themselves to create child processes, and since each Ruby process is independent, you can have as many threads as you like this way.
I knew that there were performance limitations to MRI due to the GIL, but I didn’t realise just how much faster your code could be if you parallelised it. At work recently, I took charge of implementing a new system to handle vehicle images and consolidate all the different ways that we currently process images through our app.
Our users import a huge amount of vehicles through our app every day, and many of those have several images each. We calculated roughly how many images would go through our app every day and it was a large number (~100k at least), but we figured that could be mostly solved using Last-Modified
and ETag
headers.
The real problem was the initial influx of images that we’d have to migrate from a vehicle_pictures
table to the polymorphic document_assets
table that we already have in place for other models. We use Carrierwave.
I did a test run of ~800 images on our QA server and that took ~72 minutes. Evidently not good enough. Using ruby-prof
, I determine that the majority of our time was spent in Carrierwave, in places we couldn’t really do much about. I decided to have a look at parallelising the task and seeing how we could get it down.
My initial implementation got it down to ~20 minutes. Not bad, but then I noticed I had accidentally forked once for each image, as opposed to each batch. Well, damn! Even with all that overhead for each process, the improvement was pretty good. After another attempt, this time creating two processes for each core of the machine (some research online indicated that for IO-intensive tasks like this one, you want at least that much to make sure there’s always one core doing something), yielded ~10 minutes.
The second thing I noticed from the profiling was that a lot of time was being spent in Excon (which Fog, written by the same author, uses, which Carrierwave uses to interface with S3). After benchmarking various Ruby HTTP clients, Excon turned out to be the slowest one by a wide margin. However, I did not deem it worth the work to write our own Carrierwave storage backend given the improvements gained by using the fastest one.
Luckily, our problems were not unique: I discovered a gem called carrierwave-aws
that interfaces directly with aws-sdk
, instead of Fog, and this brought our time down to ~4.5 minutes. I actually managed to get it down to ~2m37s, but retested with only the number of processes I knew we could afford in production, giving us our final ~4.5m figure.
My story is not over yet, though. Automated testing of the multiprocess code has come back to bite me. The previously happy test examples for this class no longer pass, though manual testing shows the same results as before. Parallel code can be tricky to reason about, but it can definitely be worth the payoff.