rails_event_store Only use requires

SAVEPOINTs are costly: https://www.cybertec-postgresql.com/en/subtransactions-and-performance-in-postgresql/ They were added in fc2ba5693 to stop the events with the wrong version being committed but if no version is specified then they are not needed.

I've not added any specs as I wanted to know your thoughts on this.

Aug 11 '21 12:08 nikolai-b

I must say that's intriguing — I didn't find a drawback in this solution last time I was briefly thinking about it. But I think such a change would require much more verification and test cases to prove we're not letting in any data corruption.

For sure:

the wrapping transaction in app code using RES might or might not be there, that's something outside our control
there are inserts into two tables event_store_events and event_store_events_in_streams that must both either succeed or fail
WrongExpectedEventVersion with passed expecteced_version is one reason to fail, the other may come from different broken database constraint (event id uniqueness?) — what happens is such case?

Did you have a change to measure performance impact on RES with sample payloads/queries in you application? We're only adding one savepoint here.

RES also supports MySQL. Do you perhaps know if situation there is any different from Postgres?

Sep 09 '21 19:09 mostlyobvious

@paneq Any shower thoughts? 🙃

Sep 09 '21 19:09 mostlyobvious

The included benchmark is totally not convincing for me right now. The conclusion in the article says:

Even if we take into account that the transactions in test 2 are one longer, this is still a performance regression of 60% compared with test 1.

But the author does not include 3rd test case where transactions are 50% longer (90 groups of statements instead of 60) but without savepoints (or with a different non-important statement) to prove that the savepoints are the problem and not the length itself. I think the author expected 50% more operation to take 50% more time. But the test is devised in a way that there is contention on read. I am not sure that scales linearly.

So first, I would like to see it reproduced. It does not sound hard but I don't have the time.

The PostgreSQL JDBC driver even has a connection parameter “autosave” that you can set to “always” to automatically set a savepoint before each statement and rollback in case of failure.

Which is also not our case. We don't use it after every active record statement. We use it per API call. Clients usually call event_store.publish once or a few times in a transaction.

Also, the original test uses prepared statements which Rails apps and event store don't. Every Active Record call from rails app makes a networking call to PG. I suspect that if there is any performance effect of using savepoints in PG, it's negligable compared to making one more SQL query.

Sep 09 '21 20:09 paneq

Thanks yes, that article benchmark isn't as useful as I first thought, apologies. @paneq how about here where they make the exact same critique of the benchmark as you and add a select instead.

Our production was almost grinding to a halt, we found that the wait_event SubtransSLRU peaked at such times. We re-worked our code to not use SAVEPOINT and the problem went away. Since then I've been keeping an eye on how many SAVEPOINTs we issue and noticed that it has been increasing and tracked it down to this gem. I do not believe it is currently a performance issue for us so I'm unable to provide benchmarks of any use but I think it could be an issue in the future. As you said, for publish causes only one SAVEPOINT whereas in our previous problem there were over 64 per transaction. There is nothing to stop someone trying to publish 65 events one transaction which would hit this limit.

there are inserts into two tables event_store_events and event_store_events_in_streams that must both either succeed or fail

but these are in a transaction already so that will happen, or am I missing something? The existing test shows this, if you remove the subtransaction (i.e. the requires_new) and these test lines it passes.

The only reasons I can think to use subtransactions are:

when ActiveRecord::Rollback is raised so that the outer transaction is committed but the inner one fails
to avoid the inner transaction poisoning the outer one, e.g.

ActiveRecord::Base.transaction do
  user = build_user
  ActiveRecord::Base.transaction(requires_new: true) do
    # issue some invalid SQL
  rescue ActiveRecord::StatementInvalid
  end
  user.save!
end

here the user will always be saved.

Are you able to explain why you are using the subtransaction in the first place?

WrongExpectedEventVersion with passed expecteced_version is one reason to fail, the other may come from different broken database constraint (event id uniqueness?) — what happens is such case?

Yes, this is something I was very unsure about. Looking at the code I could only see RES refusing to add an event when the version mismatched but I wasn't sure about the other cases. To be able to answer this I'd need to understand why you are using a subtransaction (as above).

Thank you for taking the time to considering this.

Sep 13 '21 11:09 nikolai-b

Just FYI, I won't have the time soon to commit to this discussion.

Sep 19 '21 07:09 paneq

I will try to elaborate on that later, but just for the sake of leaving some information: We did have encounter a situation in which lot of savepoints was serious performance issue (thousands of savepoints de facto brought our application down). That was on MySQL 5.7 (what is curious, we did not have that issue on MySQL 5.6).

Oct 20 '21 11:10 swistak35

If you need more convincing about SAVEPOINT being a bottleneck gitlab have a new post about why.

Feb 08 '22 16:02 nikolai-b

Just wanted to say I'm revisiting this PR now. I've re-read linked articles and the comments you've all left in this thread.

Summary of all mentioned problems with SAVEPOINT:

cybertec post describes the effect of >64 subtransactions on Postgres (>64 SAVEPOINT without RELEASE, nested in each other) and it's effect on SubtransSLRU
gitlab post adds the issue with single SAVEPOINT within a long-running transaction for Postgres replicas
@swistak35 shared a story about MySQL 5.7 SAVEPOINT issues, that weren't an issue on MySQL — I cannot tell whether this involved nested SAVEPOINTs (without RELEASEs)
@nikolai-b shared a story on Postgres and overflowing SubtransSLRU with >64 nested SAVEPOINTs

Correct me if I'm wrong and understood the issues differently than you did.

I also want to respond to one statement about RES use of SAVEPOINT before digging deeper how we can perhaps not use SAVEPOINT at all:

As you said, for publish causes only one SAVEPOINT whereas in our previous problem there were over 64 per transaction. There is nothing to stop someone trying to publish 65 events one transaction which would hit this limit.

It is not possible with event_store.append to create nested SAVEPOINTs. As soon as event records are inserted to respective tables, there is a RELEASE. The event_store.publish is an event_store.append + pubsub. Pub-sub triggers only after append, so any event handler appending events will start new SAVEPOINT + RELEASE without being nested in the previous one. That said, calling event_store.append/publish 65 times would not overflow SubtransSLRU — these SAVEPOINTs are not nested within each other.

# executable sample: https://github.com/RailsEventStore/rails_event_store/commit/8dbabc0358a82488c454e3eec807a7d8ed5c9646

BEGIN
SAVEPOINT active_record_1
RELEASE SAVEPOINT active_record_1
SAVEPOINT active_record_1
RELEASE SAVEPOINT active_record_1
SAVEPOINT active_record_1
RELEASE SAVEPOINT active_record_1
SAVEPOINT active_record_1
RELEASE SAVEPOINT active_record_1
SAVEPOINT active_record_1
RELEASE SAVEPOINT active_record_1
SAVEPOINT active_record_1
...
COMMIT

Feb 02 '24 20:02 mostlyobvious

Are you able to explain why you are using the subtransaction in the first place?

I don't think there's a good reason to use subtransaction in RES as it is used currently. We definitely want to perform two INSERT statements in one transaction. So we wrap them in such.

Given event_store.append/publish may be called in application code that is already wrapped in a transaction, we should be good to use that one from application code. And you've pointed that out already @nikolai-b.

These three examples perform the same in terms of RES expectations:

require "bundler/inline"

gemfile do
  source "https://rubygems.org"
  gem "activerecord"
  gem "pg"
end

require "active_record"

ActiveRecord::Base.establish_connection(ENV.fetch("DATABASE_URL"))
ActiveRecord::Schema.define do
  create_table :transactions, force: true do |t|
    t.string :label
    t.integer :amount
  end

  create_table :events, force: true do |t|
    t.string :name
    t.string :event_id
  end

  add_index :events, :event_id, unique: true
end

Event = Class.new(ActiveRecord::Base)
Transaction = Class.new(ActiveRecord::Base)

ActiveRecord::Base.logger =
  Logger
    .new(STDOUT)
    .tap do |l|
      l.formatter = proc { |severity, datetime, progname, msg| "#{msg}\n" }
    end

raise unless Transaction.count == 0
raise unless Event.count == 0
#  Transaction Count (0.1ms)  SELECT COUNT(*) FROM "transactions"
#  Event Count (0.1ms)  SELECT COUNT(*) FROM "events"

begin
  ActiveRecord::Base.transaction do
    Transaction.create(label: "kaka", amount: 100)

    event_id = SecureRandom.uuid
    Event.create(name: "deposited", event_id: event_id)
    Event.create(name: "deposited", event_id: event_id)
  end
rescue ActiveRecord::RecordNotUnique
end
raise unless Transaction.count == 0
raise unless Event.count == 0
#  TRANSACTION (0.0ms)  BEGIN
#  Transaction Create (0.2ms)  INSERT INTO "transactions" ("label", "amount") VALUES ($1, $2) RETURNING "id"  [["label", "kaka"], ["amount", 100]]
#  Event Create (0.1ms)  INSERT INTO "events" ("name", "event_id") VALUES ($1, $2) RETURNING "id"  [["name", "deposited"], ["event_id", "628db1c3-a802-47bb-bcde-b68898fc8d1f"]]
#  Event Create (0.2ms)  INSERT INTO "events" ("name", "event_id") VALUES ($1, $2) RETURNING "id"  [["name", "deposited"], ["event_id", "628db1c3-a802-47bb-bcde-b68898fc8d1f"]]
#  TRANSACTION (0.0ms)  ROLLBACK
#  Transaction Count (0.1ms)  SELECT COUNT(*) FROM "transactions"
#  Event Count (0.1ms)  SELECT COUNT(*) FROM "events"
  
begin
  ActiveRecord::Base.transaction do
    Transaction.create(label: "kaka", amount: 100)

    ActiveRecord::Base.transaction do
      event_id = SecureRandom.uuid
      Event.create(name: "deposited", event_id: event_id)
      Event.create(name: "deposited", event_id: event_id)
    end
  end
rescue ActiveRecord::RecordNotUnique
end
raise unless Transaction.count == 0
raise unless Event.count == 0
#  TRANSACTION (0.0ms)  BEGIN
#  Transaction Create (0.1ms)  INSERT INTO "transactions" ("label", "amount") VALUES ($1, $2) RETURNING "id"  [["label", "kaka"], ["amount", 100]]
#  Event Create (0.1ms)  INSERT INTO "events" ("name", "event_id") VALUES ($1, $2) RETURNING "id"  [["name", "deposited"], ["event_id", "fcc5c949-c40e-4101-b0ba-60ab49fea0fd"]]
#  Event Create (0.1ms)  INSERT INTO "events" ("name", "event_id") VALUES ($1, $2) RETURNING "id"  [["name", "deposited"], ["event_id", "fcc5c949-c40e-4101-b0ba-60ab49fea0fd"]]
#  TRANSACTION (0.0ms)  ROLLBACK
#  Transaction Count (0.0ms)  SELECT COUNT(*) FROM "transactions"
#  Event Count (0.0ms)  SELECT COUNT(*) FROM "events"
  
begin
  ActiveRecord::Base.transaction do
    Transaction.create(label: "kaka", amount: 100)

    ActiveRecord::Base.transaction(requires_new: true) do
      event_id = SecureRandom.uuid
      Event.create(name: "deposited", event_id: event_id)
      Event.create(name: "deposited", event_id: event_id)
    end
  end
rescue ActiveRecord::RecordNotUnique
end
raise unless Transaction.count == 0
raise unless Event.count == 0
#  TRANSACTION (0.0ms)  BEGIN
#  Transaction Create (0.1ms)  INSERT INTO "transactions" ("label", "amount") VALUES ($1, $2) RETURNING "id"  [["label", "kaka"], ["amount", 100]]
#  TRANSACTION (0.0ms)  SAVEPOINT active_record_1
#  Event Create (0.1ms)  INSERT INTO "events" ("name", "event_id") VALUES ($1, $2) RETURNING "id"  [["name", "deposited"], ["event_id", "e7edef32-6dba-4935-bc69-ccb4d28aa68e"]]
#  Event Create (0.1ms)  INSERT INTO "events" ("name", "event_id") VALUES ($1, $2) RETURNING "id"  [["name", "deposited"], ["event_id", "e7edef32-6dba-4935-bc69-ccb4d28aa68e"]]
#  TRANSACTION (0.0ms)  ROLLBACK TO SAVEPOINT active_record_1
#  TRANSACTION (0.0ms)  ROLLBACK
#  Transaction Count (0.0ms)  SELECT COUNT(*) FROM "transactions"
#  Event Count (0.0ms)  SELECT COUNT(*) FROM "events"

The only reason why I'd personally prefer requires_new: true in the past was the wording used in the Rails/ActiveRecord documentation and the exclusive choice of raise ActiveRecord::Rollback exception in those examples. It tricked me into thinking it described something else.

This ActiveRecord::Rollback exception is a special one that gets very particular handling in transaction blocks. It finally makes all three examples listed above behave differently. However we're not raising ActiveRecord::Rollback in RES, so dropping requires_new: true should not be an issue.

This was pointed out previously by @nikolai-b too:

The only reasons I can think to use subtransactions are: when ActiveRecord::Rollback is raised so that the outer transaction is committed but the inner one fails

Feb 02 '24 22:02 mostlyobvious

I'll rebase this PR next and review existing tests around transactions and expectations described earlier.

In the meantime I've compared use-vs-no-use of SAVEPOINT on Postgres 15.5 and the difference is not going to be noticeable:

      Warming up --------------------------------------
            with_savepoint     2.938M i/100ms
         without_savepoint     2.943M i/100ms
      Calculating -------------------------------------
            with_savepoint     29.249M (± 0.6%) i/s -    146.915M in   5.023031s
         without_savepoint     29.366M (± 0.7%) i/s -    147.173M in   5.012007s

It's still worth not doing something not necessary. And there are cases where even a single SAVEPOINT is a problem, as described in the Gitlab article.

Feb 02 '24 22:02 mostlyobvious

Thank you so much for the time you've invested in this issue, it is very impressive.

The issue we experienced with SAVEPOINT in postgresql is not that it always makes pg slower but that it can cause issues when combined with other things, replicas, long running txs, lots of savepoints. Sorry to link to yet another blog post but basically SAVEPOINT can bite. I tried modifying your sample code to

perform_100_appends_in_outer_transaction =
  lambda do
    ActiveRecord::Base.transaction do
      id = RailsEventStoreActiveRecord::Event.limit(1).lock("FOR UPDATE").ids[0]
      100.times { event_store.append(mk_event.call) }
      RailsEventStoreActiveRecord::Event.where(id: id).update_all(id: id)
    end
  end

but to cause the performance issue I think it needs to do the update inside the SAVEPOINT. I've also tried using multiple threads to simulate many clients but still no slow down.

I don't think in RES the existing SAVEPOINT does much harm (as you release it quickly) but I'd argue that it isn't needed and for that reason should be removed.

Feb 05 '24 14:02 nikolai-b

rails_event_store
rails_event_store copied to clipboard

Only use requires_new when needed

rails_event_store rails_event_store copied to clipboard

Only use requires_new when needed

rails_event_store
rails_event_store copied to clipboard