Improving test performance for Ruby and Mongoid
• ruby, rails, mongoid, mongodb, and testing
tl;dr
Recently we’ve implemented a small gem called MongoidCleaner. It’s a faster alternative for
DatabaseCleaner for Ruby
projects using MongoDB along with Mongoid.
Besides the truncate
strategy, it also provides a more performant drop
strategy.
When using RSpec, adjust your spec_helper.rb
:
Introduction
Everyone agrees that code should be covered by tests. With rising popularity and awareness for TDD in the past years, there has been quite a big debate about whether tests should hit the database or not. Actually, the majority of the tests written for our Ruby and MongoDB powered backend application - which main task is to expose APIs for storing and extracting data - are touching the database. Why? Because that’s what this particular piece of software is built for: Interacting with the underlying database. Reading, writing, aggregating, calculating, validating, et cetera.
Don’t worry, this article won’t be about taking sides in this whole “TDD / should my tests hit the database or not” discussion. Instead I want to show, how a one line change led to running entire test suites 2-4 times faster.
Ensuring clean state
So, if you are going down the path of actually persisting objects to the database in your tests, you might want to ensure having a clean state on the way. In Ruby-Wonderland you certainly can find some library which handles that for you. The most popular one is DatabaseCleaner and we’ve been using it for good in most of our Rails projects.
At TD we heavily rely on peer reviews via pull requests on Github. We are using Travis CI to build our Github projects. For each pull request a Travis build is triggered. Growing code bases and rising number of projects, hence greater number of tests and eventually bigger build matrices, caused the Travis queue to clog up and make builds wait until others are finished. At this point we started to investigate solutions to speed up our tests.
We realised that cleaning the database after each test consumed most of the time which led to investigating the subject more closely. Particularly it was about the part which is responsible for clearing collections:
remove_all
is a method implemented with moped which ultimately
will result in calling MongoDBs remove()
command. It will delete documents one bye one and
MongoDB must update every index associated with the collection in addition to the data itself.
Depending on the size of the collection, this can become a very expensive operation. Apparently it
is also expensive on small collections - like in common test scenarios - when running the command
a couple of hundred or even thousand times.
Other ways to ensure clean state
In order to find a more performant replacement for the remove()
command, I consulted the MongoDB docs.
And behold, they suggest:
To remove all documents from a collection, it may be more efficient to use the
drop()
method to drop the entire collection, including the indexes, and then recreate the collection and rebuild the indexes.
Well, about the indexes I don’t care so much in tests. Also we don’t have any logic depending on indexes. For the recreation of the collection: MongoDB creates a collection implicitly when its first referenced in a command. Neat.
Okay, so the docs prescription sounded promising and I thought it was worth going for drop()
instead.
Replacing remove() with drop()
Time for comparing. First, run the tests with the old remove_all
method:
Second, change the code showed above as follows:
As you can see, the replacement is rather trivial. Now run the tests again:
We went down from 341 seconds to ~108 seconds, which is a performance improvement by factor ~3,1. I’d say thats a nice achievement for such a small change. In general we could observe an improvement by factor 2-4 throughout our projects.
Considerations
The performance of commands like drop()
and remove()
is tightly connected to several things:
- Structure of your data: Relations, number of fields, embedded documents, …
- Usage of indexes
- Configuration of mongod, even the mongo driver or Mongoid
If you are using MongoidCleaner and run into any sort of weird behavior or don’t get performance gains, please let us know.
Problems
There are two things which I think can lead to problems:
- Sharded environments
- Logic determined by indexes
Both is not (yet) the case for us. Once we have to deal with either of those issues, we will update this post.
Having said that: Please feel free to comment on this post!