migration-lock-timeout icon indicating copy to clipboard operation
migration-lock-timeout copied to clipboard

Clarify README on disable_ddl_transaction!

Open jareks-ig opened this issue 2 years ago • 1 comments

Hello and first thanks for your great project!

Could you please expand on information in README: If you use disable_ddl_transaction!, no lock timeout will occur? I don't understand why using disable_ddl_transaction would cancel need for lock timeouts. As I understand regardless if it is used or not, ACCESS EXCLUSIVE lock needs to be obtained on table?

Is this because this gem is designed to work together with PGBouncer in transaction mode?

jareks-ig avatar Dec 15 '23 10:12 jareks-ig

We ran into this in production, though I'm not sure if it really had any negative effects in prod as the table in question doesn't have a lot of activity, so I'm still not sure why this particular migration (adding an index concurrently) waited for so long. DB monitoring tool showed the wait event as virtualxid but I'm not sure what that means.

sdhull avatar Mar 10 '25 18:03 sdhull

Super late response here, but figured I'd leave a comment for anyone who might stumble back on this in the future.

Is this because this gem is designed to work together with PGBouncer in transaction mode?

This predates me a bit, but I do believe this was the primary driver of the original implementation decisions here. Only enforcing the lock_timeout transactionally would be the easiest way to provide this functionality in this setup, and still provides value even if we lose the ability to use this when disable_ddl_transaction! is set.

I do agree that this should be called out a bit more explicitly, though.

I'm still not sure why this particular migration (adding an index concurrently) waited for so long

When Postgres adds an index concurrently, it's able to avoid taking out an AccessExclusiveLock on the relevant table by breaking up the operation into multiple phases:

  • Phase 1: The index is entered in the system catalogue, but is unusable.
  • Phase 2: Postgres waits for all transactions across the database to finish that predate Phase 1. Afterwards, the full table is scanned in order to index every visible row.
  • Phase 3: Postgres once again waits for transactions to finish (this time for those that predate Phase 2). While the index was being built in the previous phase, writes may have added new rows or updated columns that are covered by the index. To address this, another full table scan is performed to index the new data. After this phase, the index is marked as usable.

Ideally, this should mean that concurrent index creation can be run at any time, but in reality, the success of the operation is dependent on general database activity (instead of lock contention on a single table). When this process hits Phase 2 or Phase 3, if there are any long-running transactions that were previously opened, and remain open after we wait for longer than the lock_timeout value, you'll get a PG::LockNotAvailable, and the index is left in an INVALID state.

The virtualxid wait event you're referring to represents the process attempting to get a ShareLock on another transaction's "virtual transaction id", which is how it determines if that transaction has completed or not. Any lock timeouts you'd see as part of a CREATE INDEX CONCURRENTLY would arise from this particular behavior.

lessthanjacob avatar Nov 14 '25 21:11 lessthanjacob