dbmate icon indicating copy to clipboard operation
dbmate copied to clipboard

Exclusive lock during migration

Open marton78 opened this issue 5 years ago • 7 comments

Hi,

as far as I can tell, currently there is now explicit lock applied during migrations. That would mean that it is dangerous to run migrations in a concurrent setting, e.g. when several instances of some API server including dbmate are running.

It would be advisable to obtain an exclusive lock of the migrations table before migrating.

marton78 avatar Nov 04 '19 11:11 marton78

This is a good feature request, I've discussed it with others in person too. Happy to accept a PR adding this.

amacneil avatar Nov 04 '19 16:11 amacneil

Sorry but I need to pass, I have zero knowledge of Golang :)

marton78 avatar Nov 04 '19 20:11 marton78

No problem, I think someone else is already working on this. If not I will keep on the backlog.

amacneil avatar Nov 04 '19 21:11 amacneil

I miss this also. Is somebody working on it already? Otherwise I’d be happy to give it a go (pun intended, hehe). If so, I see a few paths that could be taken, and I’d just like to check what’s your opinion before moving forward :)

Firstly, do you think this would be better as a default or an opt-in? I’m thinking of something like a --lock or --concurrent flag. To me making it default would make sense though as it shouldn't really affect backwards compatibility of usage, right? And I don't really see a use case where you'd want to opt out or? Possibly to avoid deadlocks but.. idk, sounds unlikely. Maybe an opt-out flag would be better for that case?

An approach I'm thinking of is a two-phase locking based on hostname where one host acquires the lock, runs pending migrations and releases the lock again - so the next host can do the same after waiting for the first lock to release. In the case of a deadlock this makes it easy to introspect the database and find which host caused it, as opposed to using something like one-time generated ids. Possibly one could make a hybrid, for the case of someone running concurrent migrations on the same host? Or maybe that's just over engineering it?

Finally, I'm thinking of how to solve testing. This feature would need two dbmate implementations to run synchronised if it should be tested againts a real database i guess? So maybe we could add another testing phase in docker, that tests this particular feature? Additionally to some unit tests. Whats your thoughts on this?

Cheers!

palmenhq avatar Mar 17 '20 19:03 palmenhq

Hey, contribution sounds like a great idea!

I think it would be fine to make this behavior the default. I can't even think of a good reason to support opt-out: it should be completely transparent to people unless you happen to try to run multiple dbmates at the same time. Let me know if you can think of a good reason to expose this to users, but my initial reaction is that it should just be default behavior to prevent concurrent migrations.

For the implementation: I think it's going to be database specific. I'll talk about postgres because that's what I'm most familiar with. There are a couple primitives you could look into - the LOCK TABLE statement, and advisory locks (pg_advisory_lock). LOCK TABLE seems like it might be harder to use because it only lasts for the life of a transaction, and we don't hold a transaction open while updating all migrations (we open a transaction for each migration). Advisory locks I think you can use at the session level, which is probably exactly what we need (basically, once you start the dbmate up command, you grab the lock, and then release it after all migrations are complete (or maybe you can just let it automatically expire once the process dies?).

I don't follow your question about naming the locks based on hostname. I would think that you can just grab a lock with a global static name like dbmate_migration, and then anyone else running dbmate on any other hosts will be prevented from proceeding until after you release it.

Since the above are postgres-specific features, you would need to see what options exist for sqlite and mysql, but I assume there are similar features available.

Testing: Highest priority would be some unit tests to ensure the lock is acquired and released when expected. There's a separate question about having some integration tests which actually exercise dbmate concurrently to check for issues - I think this would be nice to have, but it's probably something along the lines of a script which can be run manually to look for concurrency errors. Long term it would also be nice to have a suite of integration tests which work against a dbmate binary and cover all functionality, but we don't have that yet. I wouldn't worry as much about solving this for the initial implementation.

amacneil avatar Mar 18 '20 00:03 amacneil

Sounds great!

Sorry for the unclarity, I was thinking of building the lock mechanism by having a table called something like change_lock or so - where we could store for example the hostname, and based on that each script can judge whether it has an acquired lock or not really yet. In regards of advisory locking that looks really neat though! Didn't really think of that. After some digging it seems mysql has something similar called GET_LOCK. Sqlite on the other hand.. seems trickier, as that as i understand it uses exclusive locking on a file/database level for all writes (as File Locking And Concurrency In SQLite Version 3), and I can't really find a way to acquire the database lock manually for the session (only per transaction). That being said running concurrent migrations against a single database file seems like a weird case no? But also skipping concurrency handling for one driver seems bad. Maybe a manual locking mechanism would be the way for this particular driver?

As for testing - alright, amaze!

palmenhq avatar Mar 18 '20 08:03 palmenhq

Would this project be open to a PR that adds a --concurrent-id <number> flag, and only supports certain drivers for now?

  • We could add validation that would error out if an unsupported driver is chosen along with this flag.
  • The locking logic would only be run when this flag is specified.
  • The <number> would allow users to ensure the locking ID is unique enough for their usage of dbmate and the backing database.

snarlysodboxer avatar Jul 18 '22 23:07 snarlysodboxer