td-berlin Tech Blog

Improving test performance for Ruby and Mongoid

Sascha Knobloch
• ruby, rails, mongoid, mongodb, and testing

tl;dr

Recently we’ve implemented a small gem called MongoidCleaner. It’s a faster alternative for DatabaseCleaner for Ruby projects using MongoDB along with Mongoid. Besides the truncate strategy, it also provides a more performant drop strategy.

$ gem install mongoid_cleaner

When using RSpec, adjust your spec_helper.rb:

RSpec.configure do |config|
  config.before(:suite) do
    MongoidCleaner.strategy = :drop
  end

  config.around(:each) do |example|
    MongoidCleaner.cleaning do
      example.run
    end
  end
end

Introduction

Everyone agrees that code should be covered by tests. With rising popularity and awareness for TDD in the past years, there has been quite a big debate about whether tests should hit the database or not. Actually, the majority of the tests written for our Ruby and MongoDB powered backend application - which main task is to expose APIs for storing and extracting data - are touching the database. Why? Because that’s what this particular piece of software is built for: Interacting with the underlying database. Reading, writing, aggregating, calculating, validating, et cetera.

Don’t worry, this article won’t be about taking sides in this whole “TDD / should my tests hit the database or not” discussion. Instead I want to show, how a one line change led to running entire test suites 2-4 times faster.

Ensuring clean state

So, if you are going down the path of actually persisting objects to the database in your tests, you might want to ensure having a clean state on the way. In Ruby-Wonderland you certainly can find some library which handles that for you. The most popular one is DatabaseCleaner and we’ve been using it for good in most of our Rails projects.

At TD we heavily rely on peer reviews via pull requests on Github. We are using Travis CI to build our Github projects. For each pull request a Travis build is triggered. Growing code bases and rising number of projects, hence greater number of tests and eventually bigger build matrices, caused the Travis queue to clog up and make builds wait until others are finished. At this point we started to investigate solutions to speed up our tests.

We realised that cleaning the database after each test consumed most of the time which led to investigating the subject more closely. Particularly it was about the part which is responsible for clearing collections:

# some foo before
collections.each { |c| session[c].find.remove_all }
# some bar after

remove_all is a method implemented with moped which ultimately will result in calling MongoDBs remove() command. It will delete documents one bye one and MongoDB must update every index associated with the collection in addition to the data itself. Depending on the size of the collection, this can become a very expensive operation. Apparently it is also expensive on small collections - like in common test scenarios - when running the command a couple of hundred or even thousand times.

Other ways to ensure clean state

In order to find a more performant replacement for the remove() command, I consulted the MongoDB docs. And behold, they suggest:

To remove all documents from a collection, it may be more efficient to use the drop() method to drop the entire collection, including the indexes, and then recreate the collection and rebuild the indexes.

Well, about the indexes I don’t care so much in tests. Also we don’t have any logic depending on indexes. For the recreation of the collection: MongoDB creates a collection implicitly when its first referenced in a command. Neat.

Okay, so the docs prescription sounded promising and I thought it was worth going for drop() instead.

Replacing remove() with drop()

Time for comparing. First, run the tests with the old remove_all method:

$ bundle exec rspec

Randomized with seed 14074
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Finished in 5 minutes 41 seconds (files took 1.73 seconds to load)
480 examples, 0 failures

Second, change the code showed above as follows:

# some foo before
collections.each { |c| session[c].drop }
# some bar after

As you can see, the replacement is rather trivial. Now run the tests again:

$ bundle exec rspec

Randomized with seed 32186
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Finished in 1 minute 47.94 seconds (files took 1.71 seconds to load)
480 examples, 0 failures

We went down from 341 seconds to ~108 seconds, which is a performance improvement by factor ~3,1. I’d say thats a nice achievement for such a small change. In general we could observe an improvement by factor 2-4 throughout our projects.

Considerations

The performance of commands like drop() and remove() is tightly connected to several things:

If you are using MongoidCleaner and run into any sort of weird behavior or don’t get performance gains, please let us know.

Problems

There are two things which I think can lead to problems:

  1. Sharded environments
  2. Logic determined by indexes

Both is not (yet) the case for us. Once we have to deal with either of those issues, we will update this post.

Having said that: Please feel free to comment on this post!