Add support to cancel/force rollback a migration
In managed environments, like AWS RDS or GCP CloudSQL it is necessary to run the migration from a separate machine. Then a network issue, ec2/vm failure or similar can happen causing the db connections to drop, failing the migration in turn. This is likely to leave the migration in an inconsistent state, being unable to neither complete nor rollback, having to cleanup manually. Please, add option abort/cancel/force rollback that wouldn't stop on consistency errors on only partially completed migration.
Thank you for opening this @danzika, do you have some logs of the errors you saw that you can share? It would be interesting to see those to diagnose the issue.
I agree overall it should be possible to recover from a failed start, either with the rollback command or something else
Actually I don't have logs for this. We just discussed this feature yesterday on a Meetup with @tsg. So this is rather a feature request/enhancement than actual bug I have hit :)
Nice! thank you for opening. I believe as of today it's already possible to rollback a failed start. We make rollback operations idempotent so they can be run against intermediate states of start. Let's keep the issue open to double check this before closing it!
Amazing! Thanks for instant reaction. If already implemented it brings the tool one step closer to production readiness. Kudos
Hi @danzika nice see you here!
In the event of a migration failure (e.g., network issues, VM crash), the migration remains in an "active" state. You can re-run pgroll rollback from any machine that can connect to the database; the tool does not require state on the original host. This design makes it possible to recover from most partial failures without manual cleanup.
The rollback code is designed to be idempotent and robust against partial application, test cases in the repo simulate failures during migration and assert that the active migration is properly cleared and the database is left in a consistent state (example test).
As @exekias responded before, the rollback is designed to recover from partial/inconsistent states, but does not currently offer a "force" flag to aggressively clean up irrecoverable situations. Your request for an explicit "force abort" option makes sense for cases where even rollback cannot proceed due to severe inconsistency.
Let us know if you have an example of a case where the standard rollback fails and manual cleanup is unavoidable, so we can prioritize and design this feature accordingly!