September 28, 2020

The Complexity of Active Record Transactions

I’ve recently picked up the sequel-activerecord_connection gem again to make some reliability improvements around database transactions. For context, this gem extends Sequel with the ability to reuse Active Record’s database connection, which should lower the barrier for trying out Sequel in apps that use Active Record.

After pushing some fixes, I was thinking how working on this gem has greatly increased my familiarity with the internals of database transaction implementations of both Active Record and Sequel. Since there aren’t any existing articles on this topic, I thought it would be useful to share the knowledge I gathered over the past few months.

This article will compare the transaction API implementation between Active Record and Sequel, and assumes the reader is already familiar with Active Record’s transaction API usage. As the title suggests, I will be critical of Active Record’s implementation. I know some will perceive this as “not nice”, but I think it’s important to be aware of the internal complexity of libraries we’re using every day (myself included).

Model vs Database

Active Record

Active Record transactions are typically called on the model, which is shown in the official docs as well. I think this can be misleading to novice developers, as it suggests that database transactions are tied to specific database tables, when in fact they’re applied to any queries made by the current database connection.

# opens a connection-wide transaction (unrelated to the `accounts` table)
Account.transaction do
  balance.save!
  account.save!
end

Active Record provides transaction callbacks as part of a model’s lifecycle, allowing you to execute code after the transaction commits or rolls back. This, for example, allows you to spawn a background job after a record is persisted, but wait until the transaction commits to ensure the record is up-to-date when the background job is picked up.

class Account < ActiveRecord::Base
  after_create_commit :send_welcome_email

  private

  def send_welcome_email
    AccountMailer.welcome(self).deliver_later
  end
end

In my opinion, this approach has several issues. For one, it encourages putting business logic into your Active Record models, and generally increases complexity of the model lifecycle. It’s unfortunately not trivial to use transaction callbacks outside of models, because they’re coupled to models (although there are gems that work around that).

Transaction callbacks can also negatively impact memory usage if you’re allocating many model instances within a transaction, as references to these model instances are held until the transaction is committed or rolled back, which prevents Ruby from garbage collecting them beforehand. Active Record will do this for any model that has any transaction callbacks defined.

class Comment < ActiveRecord::Base
  after_commit :deliver_new_mentions, on: [:create, :update], if: :body_changed?

  private

  def deliver_new_mentions
    MentionNotificationJob.deliver_later(self)
  end
end

ActiveRecord::Base.transaction do
  author.comments.find_each do |comment|
    # Even though we're not triggering the `after_commit` callback here, Active
    # Record will still keep references to these model instances, preventing
    # Ruby from garbage collecting them until the transaction is closed.
    comment.update(author: new_author)
  end
end

Sequel

In Sequel, the transaction API is implemented on the database object, which is completely decoupled from models.

DB = Sequel.connect(adapter: "postgresql", database: "myapp") #=> #<Sequel::Database ...>

# calling #transaction on the database object communicates it's connection-wide
DB.transaction do
  balance.save
  account.save
end

Sequel also has transaction hooks, but they too are defined on the database object, and aren’t tied to models in any way – they’re just blocks of code that get executed after the transaction is committed or rolled back. This makes them possible to use in business logic that lives outside of models (of course, in that case one can also just move the code outside of the transaction block).

class CreateAccount
  def call(attributes)
    DB.transaction do
      account = Account.create(attributes)
      send_welcome_email(account)
      account.update(api_key: SecureRandom.hex)
    end
  end

  private

  def send_welcome_email(account)
    # queue email delivery after the enclosing transaction commits
    DB.after_commit do
      AccountMailer.welcome(account).deliver_later
    end
  end
end

And if you really want to register transaction hooks on the model level, you can do that inside regular model lifecycle hooks:

class Account < Sequel::Model
  def after_create
    db.after_commit { AccountMailer.welcome(self).deliver_later }
  end
end

By giving us the ability to compose APIs this way, the Sequel::Model class was able to remain unaware of the existence of transaction hooks (which keeps it simpler), but we were still able to achieve the same functionality as we have with Active Record.

Note that the examples above will still keep references to the model instances until the transaction is closed. However, Sequel’s API gives us the necessary control to change that. For example, we can choose to register a transaction hook only if a certain condition holds (useful in use cases like file attachments):

class Comment < Sequel::Model
  def after_save
    if column_changed?(:body)
      db.after_commit { MentionNotificationJob.deliver_later(self) }
    end
  end
end

DB.transaction do
  author.comments_dataset.paged_each do |comment|
    # The transaction hooks aren't registered, so Ruby can garbage collect
    # these model instances while the loop is running.
    comment.update(author: new_author)
  end
end

We can also register a transaction hook in a way where it will only keep the reference to the record id instead of the whole record instance:

class MentionNotifications
  def self.enqueue(comment_id)
    db.after_commit { MentionNotificationJob.deliver_later(comment_id) }
  end
end

class Comment < Sequel::Model
  def after_save
    NotificationMentions.enqueue(id)
  end
end

Transaction state

Active Record

Active Record maintains transaction state on the connection level, but a lot of transaction-related state is also maintained at the model level. While the transaction manager is implemented pretty decently, the ActiveRecord::Transactions module is incredibly complex, and has been the source of numerous issues.

The reason for this complexity is that every new incoming bug has generally been solved by adding yet another tweak, yet another conditional, yet another instance variable. And some of these instance variables even leak outside of the ActiveRecord::Transactions module, which indicates a leaky abstraction.

Honestly, for me this reached a state where I don’t consider Active Record’s transaction callbacks to be safe enough for production, and I try to avoid them whenever possible.

Sequel

Sequel stores all the transaction state in a single @transactions instance variable on the database object. Models don’t have access to the transaction state, which keeps transactions fully decoupled from models.

DB.transaction do |conn|
  DB.after_commit { ... }
  DB.transaction(savepoint: true) do
    DB.instance_variable_get(:@transactions)[conn] #=>
    # {
    #   after_commit: [
    #     <Proc...> # the block we've registered above
    #   ],
    #   savepoints: [
    #     { ... }, # transaction data
    #     { ... }  # savepoint data
    #   ]
    # }
  end
end

If you’re reading Sequel’s transaction code, you’ll notice that all of it is contained in a single file and single context (including transaction hooks). In my experience this made the logic much easier to grok.

Lazy transactions

Active Record

In version 6.0, Active Record introduced a performance optimization that makes transactions lazy. What this means is that Active Record will issue BEGIN/COMMIT queries only if there was at least one query exected inside the transaction block.

ActiveRecord::Base.transaction do
  ActiveRecord::Base.connection.execute "SELECT 1"
end
# BEGIN
# SELECT 1
# COMMIT

ActiveRecord::Base.transaction do
end
# (no queries were executed)

The main use case behind this addition seems to be saving lots of records whose attributes didn’t change, where each attempted update would execute empty BEGIN/COMMIT statements (even though no UPDATE was issued), which didn’t perform well. A workaround at the time would be to call record.save if record.changed? instead.

article = Article.find(id)
article.published #=> true
article.update(published: true) # executed empty BEGIN/COMMIT prior to Active Record 6.0

However, as Sean Griffin had pointed out in the pull request review, this added significant complexity for very little gain. In addition to requiring additional transaction state, each Active Record adapter is now also responsible for materializing transactions when necessary.

Sequel

In Sequel, opening a transaction will always execute BEGIN/COMMIT statements (if the transaction commits), regardless of whether any queries were made inside the block or not.

DB.transaction do
end
# BEGIN
# COMMIT

Sequel::Model#save behaves differently than ActiveRecord::Base#save, in terms that it always executes an UPDATE statement for an existing record (updating all columns). To update only changed attributes, you would use Sequel::Model#save_changes, which doesn’t execute UPDATE if no attributes have changed. And Sequel::Model#update calls #save_changes under the hood:

article = Article.find(id)
article.published #=> true
article.update(published: true) # no queries executed

Unlike ActiveRecord::Base#save, Sequel::Model#save_changes doesn’t open a transaction if it won’t execute the UPDATE statement. This seems like a much more elegant solution to the problem Active Record’s lazy transactions intended to solve, but with none of the complexity.

Final words

I really care that libraries I’m using at work have sufficiently straightforward internals that I can understand when debugging an issue. When it comes to database transactions, Active Record’s internal complexity is just too overwhelming for me (and that’s coming from someone who contributes to open source on a daily basis).

On the other hand, the Sequel’s transaction implementation was fairly straightforward to understand, which is all the more impressive considering that it’s more feature-rich compared to Active Record (see the docs). And this is not an exception – I regularly see this pattern whenever I’m reading Sequel’s source code

Hopefully this article will add another point towards Sequel for people starting new Ruby/Rails projects.

The Complexity of Active Record Transactions

Model vs Database

Active Record

Sequel

Transaction state

Active Record

Sequel

Lazy transactions

Active Record

Sequel

Final words

Janko Marohnić