TL;DR: AdequateRecord is a set of patches that adds cache stuff to make
ActiveRecord 2x faster

I’ve been working on speeding up Active Record, and I’d like to share what I’ve
been working on! First, here is a graph:

transformations

This graph shows the number of times you can call Model.find(id) and
Model.find_by_name(name) per second on each stable branch of Rails. Since it
is “iterations per second”, a higher value is better. I tried running this
benchmark with Rails 1.15.6, but it doesn’t work on Ruby 2.1.

Here is the benchmark code I used:

require 'active_support'
require 'active_record'

p ActiveRecord::VERSION::STRING

ActiveRecord::Base.establish_connection adapter: 'sqlite3', database: ':memory:'
ActiveRecord::Base.connection.instance_eval do
  create_table(:people) { |t| t.string :name }
end

class Person < ActiveRecord::Base; end

person = Person.create! name: 'Aaron'

id   = person.id
name = person.name

Benchmark.ips do |x|
  x.report('find')         { Person.find id }
  x.report('find_by_name') { Person.find_by_name name }
end

Now let’s talk about how I made these performance improvements.

What is AdequateRecord Pro™?

AdequateRecord Pro™ is a fork of ActiveRecord with some performance
enhancements. In this post, I want to talk about how we achieved high
performance in this branch. I hope you find these speed improvements to be
“adequate”.

Group discounts for AdequateRecord Pro™ are available depending on the number
of seats you wish to purchase.

How Does ActiveRecord Work?

ActiveRecord constructs SQL queries after doing a few transformations. Here’s
an overview of the transformations:

transformations

The first transformation comes from your application code. When you do
something like this in your application:

Post.where(...).where(...).order(..)

Active Record creates an instance of an ActiveRecord::Relation that contains
the information that you passed to where, or order, or whatever you called.
As soon as you call a method that turns this Relation instance in to an array,
Active Record does a transformation on the relation objects. It turns the
relation objects in to ARel objects which represent the SQL query AST. Finally,
it converts the AST to an actually SQL string and passes that string to the
database.

These same transformations happen when you run something like Post.find(id),
or Post.find_by_name(name).

Separating Static Data

Let’s consider this statement:

Post.find(params[:id])

In previous versions of Rails, when this code was executed, if you watched your
log files, you would see something like this go by:

SELECT * FROM posts WHERE id = 10
SELECT * FROM posts WHERE id = 12
SELECT * FROM posts WHERE id = 22
SELECT * FROM posts WHERE id = 33

In later versions of Rails, you would see log messages that looked something
like this:

SELECT * FROM posts WHERE id = ? [id, 10]
SELECT * FROM posts WHERE id = ? [id, 12]
SELECT * FROM posts WHERE id = ? [id, 22]
SELECT * FROM posts WHERE id = ? [id, 33]

This is because we started separating the dynamic parts of the SQL statement
from the static parts of the SQL statement. In the first log file, the SQL
statement changed on every call. In the second log file, you see the SQL
statement never changes.

Now, the problem is that even though the SQL statement never changes, Active
Record still performs all the translations we discussed above. In order to
gain speed, what do we do when a known input always produces the same output?
Cache the computation.

Keeping the static data separated from the dynamic data allows
AdequateRecord to cache the static data computations. What’s even more cool is
that even databases that don’t support prepared statements will see an
improvement.

Supported Forms

Not every call can benefit from this caching. Right now the only forms that are
supported look like this:

Post.find(id)
Post.find_by_name(name)
Post.find_by(name: name)

This is because calculating a cache key for these calls is extremely easy. We
know these statements don’t do any joins, have any “OR” clauses, etc. Both of
these statements indicate the table to query, the columns to select, and the
where clauses right in the Ruby code.

This isn’t to say that queries like this:

Post.where(...).where(...).etc

can’t benefit from the same techniques. In those cases we just need to be
smarter about calculating our cache keys. Also, this type of query will never
be able to match speeds with the find_by_XXX form because the find_by_XXX
form can completely skip creating the ActiveRecord::Relation objects.
The “finder” form is able to skip the translation process completely.

Using the “chained where” form will always create the relation objects, and we
would have to calculate our cache key from those. In the “chained where” form,
we could possibly skip the “relation -> AST” and “AST -> SQL statement”
translations, but you still have to pay the price of allocating
ActiveRecord::Relation objects.

When can I use this?

You can try the code now by using the adequaterecord branch on
GitHub
. I think we will
merge this code to the master branch after Rails 4.1 has been released.

What’s next?

Before merging this to master, I’d like to do this:

  1. The current incarnation of AdequateRecord needs to be refactored a bit. I have
    finished the “red” and “green” phases, and now it’s time for the “refactor” step.
  2. The cache should probably be an LRU. Right now, it just caches all of the things, when we should probably be smarter about cache expiry. The cache should be bounded by number of tables and combination of columns, but that may get too large.

After merging to master I’d like to start exploring how we can integrate this
cache to the “chained where” form.

On A Personal Note

Feel free to quit reading now. :-)

The truth is, I’ve been yak shaving on this performance improvement for years.
I knew it was possible in theory, but the code was too complex. Finally I’ve
payed off enough technical debt to the point that I was able to make this
improvement a reality. Working on this code was at times extremely depressing.
Paying technical debt is really not fun, but at least it is very challenging.
Some time I will blurrrgh about it, but not today!

Thanks to work (AT&T) for giving me the time to do this. I think we can make
the next release of Rails (the release after 4.1) the fastest version ever.

EDIT: I forgot to add that newer Post.find_by(name: name) syntax is
supported, so I put it in the examples.

Read more at the source