rails Adapter specific advisory lock (support for PostgreSQL per schema locking)

Adapter specific advisory lock (support for PostgreSQL per schema locking)

Open leshik opened this issue 2 years ago • 14 comments

Summary

This is to revive #40251.

For PostgreSQL, the migrator should use both database and schema when creating a lock so concurrent migrations can be executed across schemas (e.g. parallel_migration_threads option in apartment gem).

Other Information

The main issue with #40251 was introducing PostgreSQL-specific logic to the migrator. This pull moves lock_id generation logic to the abstract adapter. It allows creation of adapter-specific generate_advisory_lock_id methods.

For example, in case of PostgreSQL pg_try_advisory_lock() function can accept either single signed 64-bit integer lock argument, or two signed 32-bit integers. The latter is a perfect case for using it with PostgreSQL schemas.

/cc @eileencodes @tgxworld @marksiemers

Ref #31627 Ref influitive/apartment#506 Ref rails-on-services/apartment#147 Ref rails-on-services/apartment#149

Oct 21 '21 03:10 leshik

@kamipo Could you have a look? Moving to a compound key is quite good way to work with PG schemas

Jan 02 '22 10:01 noma4i

Im slightly desperate about this PR to be reviewed. @rafaelfranca could you have a look?

Jan 05 '22 11:01 noma4i

@drewtempelmeyer hey! Could you have a look?

Jan 18 '22 12:01 noma4i

@ryanswood check it plz

May 03 '22 16:05 noma4i

@noma4i Changes look good to me though what really is needed to get this PR noticed by someone with merge capabilities. Though, I see a few Rails members tagged.

May 04 '22 23:05 ryanswood

@eileencodes - Would you be willing to review?

May 07 '22 13:05 marksiemers

Thanks for opening a pull request.

The latter is a perfect case for using it with PostgreSQL schemas.

Would you explain why the latter "two signed 32-bit integers" is preferred over the current "single signed 64-bit integer" implementation? I'd like to know if this is mandatory.

Hi @yahonda - Thank you for taking a look.

There is detail in the referenced PRs, and I'll summarize below: Ref https://github.com/rails/rails/issues/31627 Ref https://github.com/influitive/apartment/issues/506 Ref https://github.com/rails-on-services/apartment/issues/147 Ref https://github.com/rails-on-services/apartment/pull/149

In summary, performing operations like a database migration in parallel should be safe, as long as the operations are in different schemas.

Apartment takes advantage of the isolation that schemas provide, which means an application that uses apartment can end up with thousands of database schemas (which have the same structure, but different data).

So, it is extremely helpful to be able to run a migration across many schemas at once. This allows a migration to complete in orders of magnitude more quickly. And without incorporating the two integers (one for database, one for schema), we are stuck running migrations serially (and slowly) for any apps that use apartment.

Jun 15 '22 18:06 marksiemers

Thanks for the explanation. Now I understand two integers are necessary, one for database and another one for the schema.

Now I'm taking a look at the code and I am still not sure if it is a good idea that ActiveRecord::ConnectionAdapters::AbstractAdapter#generate_advisory_lock_id has actual implementation while ActiveRecord::ConnectionAdapters::AbstractAdapter#supports_advisory_locks? returns false.

This is because this pull request is moving some implementation from ActiveRecord::Migrator that have no concept of database specific implementation.

What do you think about ActiveRecord::ConnectionAdapters::AbstractAdapter#generate_advisory_lock_id not implenting anything or raiseing NotImplementedError and let each connection adapter implements generate_advisory_lock_id?

Jun 21 '22 13:06 yahonda

I have a broad concern here around whether it's reasonable to assume that migrations will only affect a single schema. That feels like an apartment-centric assumption, so I'm inclined to help them introduce that idea rather than adopt it wholesale in Active Record.

Jun 21 '22 13:06 matthewd

Would you explain why the latter "two signed 32-bit integers" is preferred over the current "single signed 64-bit integer" implementation?

I don't feel like this question has been answered -- indeed, I don't believe it has an answer.

Within the assumption framework presented, there's reason that the lock identifier should be scoped to the schema, but that doesn't require us to use the two-argument form of the function -- we could trivially combine the two values ourselves (and keep a salt, in the process), thereby keeping the API more consistent.

Jun 21 '22 13:06 matthewd

I am thinking that if generate_migrator_advisory_lock_id or db_name_hash inside the generate_migrator_advisory_lock_id can be overwritten cleanly by the connection adapters or something, it may satisfy the apartment specific requirement and Rails compatibility.

Jun 21 '22 14:06 yahonda

As far as I understand, apartment gem has its own connection adapters for each database, https://github.com/rails-on-services/apartment/blob/development/lib/apartment/adapters/postgresql_adapter.rb

Just moving generate_advisory_lock_id method to the connection adapters may be enough? I have pushed a WIP commit https://github.com/rails/rails/compare/main...yahonda:rails:another_advisory_lock

Jul 05 '22 11:07 yahonda

@yahonda I have reviewed your idea. Good idea to introduce generate_advisory_lock_id at abstract adapter. Bad part - you have copy-pasted code for both mysql and pg adapters which renders good idea to be bad as it doesn't work. PG adapter should contain postgresql specific logic for db+schema name lock.

Jul 21 '22 15:07 noma4i

It is intentional that https://github.com/rails/rails/compare/main...yahonda:rails:another_advisory_lock does not change any advisory logic for PostgreSQL adapter.

This commit just moves generate_advisory_lock_id method from ActiveRecord::Migration to ActiveRecord::ConnectionAdapters::AbstractAdapter and its children ActiveRecord::ConnectionAdapters::PostgreSQLAdapter and ActiveRecord::ConnectionAdapters::AbstractMysqlAdapter.

By doing that apartment or whatever gems that inherit connection adapters can implement their own generate_advisory_lock_id based on their own requirements.

Jul 25 '22 12:07 yahonda

Would you explain why the latter "two signed 32-bit integers" is preferred over the current "single signed 64-bit integer" implementation?

I don't feel like this question has been answered -- indeed, I don't believe it has an answer.

Within the assumption framework presented, there's reason that the lock identifier should be scoped to the schema, but that doesn't require us to use the two-argument form of the function -- we could trivially combine the two values ourselves (and keep a salt, in the process), thereby keeping the API more consistent.

@matthewd - Thank you for the followup on this. Are you suggesting something like this:

def generate_advisory_lock_id
  db_name_hash = Zlib.crc32("#{current_schema}:#{current_database}")
  Migrator::MIGRATOR_SALT * db_name_hash
end

It sounds like the steps forward are:

A PR with @yahonda 's work is created (and hopefully approved and merged)
The code above (or similar variant) is implemented in the apartment gem for the Postgresql Adapter

Does this sound correct, @yahonda @matthewd @noma4i and @ryanswood

Aug 15 '22 00:08 marksiemers

around a year to make 2 to be 1+1.

@marksiemers Look at https://github.com/rails/rails/compare/main...yahonda:rails:another_advisory_lock it's putting MySQL logic back to PostgreSQL. Should be not MySQL specific or just as proposed in this PR - be a separate one

Aug 15 '22 12:08 noma4i

To make sure - @yahonda should not be merged as it's MySQL-centric. He is about to copy-paste MySQL behaviour for no reason, which is just useless https://github.com/yahonda/rails/blob/66f4944a05732fae0cb2b81076a80c32213153f7/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb#L429

Aug 15 '22 17:08 noma4i

PostgreSQL ALWAYS uses a schema. Even if you want to go without one, it's always there. Just use it.

Aug 15 '22 17:08 noma4i

Let me clarify my point. Rails migration task allows changing multiple schema objects. Therefore, we cannot change the advisory lock scope to per schema by default.

I understand there are some gems like apartments wants to run Rails migration per schema, so changing the generate_advisory_lock_id location from ActiveRecord::Migration to ActiveRecord::ConnectionAdapters can allow each gem to override the generate_advisory_lock_id method if necessary.

Mar 16 '23 06:03 yahonda

@yahonda please review this code changes as this drives the misunderstanding. The purpose of this change is to move code from activerecord/lib/active_record/migration.rb to abstract_adapter.rb. which will allow the further overrids per-adapter and not enforce any change in the default migration logic. This change won't affect any logic as it's just the code been moved within the Rails codebase. Additionally, there is PG-specific code added to generate locks based on a more granular schema-based approach instead of a single 64-bit integer.

From the Postgresql doc: PostgreSQL provides a means for creating locks that have application-defined meanings. These are called advisory locks, because the system does not enforce their use — it is up to the application to use them correctly https://www.postgresql.org/docs/current/explicit-locking.html#ADVISORY-LOCKS

This change is not specific to any solution like "Apartment." The aim of this pull request is to fix the error of misusing the advisory lock generator where the original author was not aware of system-wide schema use in Postgres.

Mar 23 '23 12:03 noma4i

We are ok to resolve the conflicts and refactor the code if the approach is accepted @yahonda

Mar 23 '23 12:03 noma4i

Additionally, there is PG-specific code added to generate locks based on a more granular schema-based approach instead of a single 64-bit integer.

This is not a reasonable change, and we're not going to make it.

The fact that a Postgres database contains schemas, and always has a current_schema, is irrelevant. It is incorrect to assume that two connections with different current_schema values will not interfere with each other during migration. References to tables and other objects inside DDL statements can be "schema qualified". Such "schema qualified" identifiers refer to, and thus potentially manipulate, objects that live in a schema other than current_schema.

If you believe it is safe to assume that migrations will only affect the current schema, and therefore it is safe to scope migration locks to current_schema, that must be an application-level concern. (Or, optionally, an assumption imposed by some library that is not Rails.)

Mar 23 '23 14:03 matthewd

@matthewd could you give an example of that inference? I have checked with pg docs and have googled explicitly for postgres schema inference can't find any valid one.

PR will allow the same behaviour as it's now while relaxing the rules. As for now, we can observe the hardcode while the approach of this change will allow more flexibility.

I'm cool with the current state of the code. I was waiting for years for the ConnectionHandling to be changed to support multiple connections while facing the same critique: who needs that? Should be app-specific! Write your own gem.. etc

Mar 23 '23 14:03 noma4i

rails rails copied to clipboard

Adapter specific advisory lock (support for PostgreSQL per schema locking)

Summary

Other Information

rails
rails copied to clipboard