postgres_migrator
postgres_migrator copied to clipboard
Introduce pgAdmin as a diff engine
First off, great project! I found it on Reddit, while stumbling through the landscape of less-than-ideal migration tooling. It's surprising how little tooling there is for the diff <> generate approach to database migrations.
While trying it out last week I ran into an issue with Migra, that's a bit of a non-starter for my current project's schema. The Supabase team wrote this rather helpful article that outlines their decision to use the pgAdmin schema diff tool, instead of Migra, and have packaged the tool into a CLI made available via docker.
I went ahead and forked the project, swapping pgAdmin in place of migra. Unfortunately the docker images didn't compose well; the pgAdmin docker image requires it be run as a user that's valid with respect to its container, not the host machine. This spurred the thought of running migrator as a binary (not dockerized), that would run the diff via docker.
I'm currently working on a Node.JS project, so the quickest path to distribution was to wrap pre-built binaries in an NPM package:
- https://www.npmjs.com/package/@peykio/migrator
So! Why am I telling you all this?
I was wondering if you had any desire to bring in pgAdmin as a diff engine, perhaps giving users the option to choose between pgAdmin and Migra? There would be some implications to discuss (use of docker, binary distribution, ...), but those could be worthwhile discussions to have.
It's a great concept and implementation, I'd be happy to contribute to the project in the interest of being able to remove the fork.
Hmm these are great questions! And thank you for taking the initiative and being so kind in your contribution :)
In general I'd rather have a single implementation rather than maintaining paths to make it configurable. If pgadmin is unambiguously better than migra I'd rather just use it. Here's my sketch of the implementation that would be most elegant, I'll keep thinking about this and how we could do it easier.
- It looks like pgadmin supports installation using pypi, so the current docker image could just install it instead of migra.
- I'd prefer to only depend directly on pgadmin rather than supabase's fork. The supabase cli isn't very big or complicated, so it doesn't scare me to not use that particular implementation but instead use it as inspiration and mimic what it does directly in
postgres_migrator. - Using something like
pyo3to directly drive pgadmin's diffing functions and skip all the flask nonsense would be preferable to running the diff as a separate cli process, but it all depends. I can be pragmatic here ha. - We can fiddle with docker user stuff if necessary. The only reason my examples use the current user is just so generated migration files are owned by the current user, but that's just a small convenience thing and not critical.
So I guess to answer your question: I'm open to switching! I'll do some more research, and when I get time I'll play around with how to do it nicely.
Thank you for bringing this to my attention!
Thanks for the response, glad you're into it!
I can't say for certain that pgAdmin's diff is unambiguously better, but the fact that both myself and Supabase ran into blocking issues with Migra doesn't exactly inspire confidence (not that n=2 is conclusive...). Without having looked too deep, I'd wager that pgAdmin is maintained by a community that's a bit closer-to-the-heart of Postgres as well.
It looks like Supabase does have an option to diff with migra, we could reach out to them to see why they've left it as an option.
Regarding Docker, I didn't have any luck running migrator without the -u $(id -u ${USER}):$(id -g ${USER}) option, it failed due to lack of permissions. That might have been related to something else though 🤷♂️.
If there's anything you'd like me to help with, let me know!
Your notes on implementation are 👌, controlling pgAdmin via pyo3 is a really interesting prospect!
I played around with the two different tools the other day, and migra generated more "clean" diffs that seemed closer to the original sql. Of course they were still pretty syntactically different, and pgadmin's seemed more correct which is more important. Ultimately it wouldn't be that hard to allow both in the final product, but first I want to get one of them working in a truly programmatic way and then worry about it.
I was also pretty frustrated by the performance and "logging cruft" of the pgadmin tool when being used in the cli form. I definitely would love to not bring all the flask nonsense along for the ride.
The tiny bit of digging suggested all the real diffing work is done by these functions in the pgadmin repo:
fetch_compare_schemascompare_schema_objects
This promises to be difficult though, since pgadmin is deeply structured around flask. It's possible (and in my opinion preferable) to hack/cram/monkeypatch such that we can call the real functions without starting a flask server or building a sqlite database. What I really want is for the pgadmin team to make a more pure flaskless way to call their diffing algorithm!
I'm afraid I think I have to shelve this work for a bit :/ I have some urgent things my deep work time needs to be spent on. I'll get back to this eventually, but I'm not sure when. You are free to dig into that more! As always pull requests are welcome!