sqlx icon indicating copy to clipboard operation
sqlx copied to clipboard

Postgres migrations: incomplete transaction handling

Open crepererum opened this issue 2 years ago • 5 comments

Abstract

For postgres, the migration script and the _sqlx_migrations state change run in two different transactions, which requires that the migration scripts are idempotent. This is quite hard to archive for certain scripts. It would be nice if both parts would run within the same transaction.

Technical Details

See this code here:

https://github.com/launchbadge/sqlx/blob/d9fd21c94e57db3e02bf2ef6b3a9cc94296ce64a/sqlx-core/src/postgres/migrate.rs#L217-L247

Now imagine that the process dies here:

https://github.com/launchbadge/sqlx/blob/d9fd21c94e57db3e02bf2ef6b3a9cc94296ce64a/sqlx-core/src/postgres/migrate.rs#L229

Now the migration script was executed and committed, but SQLx forgot about that fact. A subsequent migrator run will execute the script again.

References

I'm quite certain that I've observed that issue with InfluxDB IOx (commit 500ece7c138e6b4f04b6761c68bd447ad69a6845) which uses sqlx = 0.6.0 and postgres 13.6.

crepererum avatar Jul 12 '22 11:07 crepererum

I do agree, it seems strange for the insert into _sqlx_migrations to happen outside of the transaction. I believe it's so that execution_time includes the transaction commit (which is where most of the work is done), but it also seems strange to store that in the database in the first place.

abonander avatar Jul 12 '22 21:07 abonander

To keep execution_time while fixing this issue, I would do the insert into _sqlx_migrations inside the transaction with execution_time set to 0, then commit the transaction and update execution_time afterward.

abonander avatar Jul 12 '22 21:07 abonander

To keep execution_time while fixing this issue, I would do the insert into _sqlx_migrations inside the transaction with execution_time set to 0, then commit the transaction and update execution_time afterward.

That sounds like a good way to solve it. Would a nested transaction construct also work?

crepererum avatar Jul 13 '22 08:07 crepererum

Postgres doesn't support nested transactions. We simulate them using savepoints, which wouldn't work for that.

There's PREAPRE TRANSACTION which does most of the work of committing a transaction without fully publishing the changes, but I think that would be more error prone than it's worth. Mainly, if the final COMMIIT PREPARED statement to publish the changes doesn't get executed then the transaction state will be left on disk, including any locks that it held. That could easily deadlock the database since modifying tables requires locking them.

Worst case scenario with the fix I described above, you have migrations with a recorded 0 execution time. Or maybe it should default to -1? The column is NOT NULL.

abonander avatar Jul 13 '22 21:07 abonander

I don't see why migration need that execution time in first place. I just played with migration yesterday and felt awkward with that column. I always think that the execution time for a migration isn't that important, and it'll be difference with each computer run it too so not much point for store that value.

smonv avatar Jul 14 '22 07:07 smonv