pgroll icon indicating copy to clipboard operation
pgroll copied to clipboard

Difficulties integrating into project

Open ericwooley opened this issue 1 year ago • 14 comments

Hello!

I love the project and the goal.

However, as it stands, it's pretty hard to integrate into a project with multiple branches and developers.

Presumably, each developer, working on different branches would have different migration files. Something like this

migrations
├── create_book.json             # dev 1 on branch book which was branched off main
├── create_checkouts.json.   # dev 2 on branch checkouts which was branched off of book
└── create_library.json          # dev 3 on branch library which was branched off main

When they all get merged it's hard to know whats been merged, and in what order to run them.

i've resorted to scripting the creation of migrations so that they have time stamps, so they run in order of the time the developer created them, which seems to work ok, but then the migration script on pull down needs to figure out which migrations to run, and which ones not to run for this environment, and there doesn't seem to be a way to check that using the CLI. Currently I'm having to resort to querying the migrations table myself to see what needs to be run.

Another option i considered was storing individual operations in a folder, based on release, but this also gets hard to manage as different releases might happen with different features etc.... Which would require moving files around as part of the release process.

Guidance on this would be helpful, or possibly allowing a status query on individual migrations files, so that I can check if each migration has already been run, and skip it if so.

I'm using this for a typical SASS deployment process for now, so most of the time I will just be running pgroll start <file> --complete. But I like the idea of using this for gnarlier future migrations as well, when necessary.

ericwooley avatar Nov 01 '24 21:11 ericwooley

Thank you for opening this one @ericwooley!

We recently discussed this, and this is a possible approach we could take:

  • Migration files must be in a well-known (configurable) folder, ie pgroll
  • Migration files should be sorted by name, ie by leveraging timestamps (as you do), or numbering (001_, etc)
  • We introduce a pgroll migrate subcommand, that will apply all pending migrations to the database, and leave the last one started (without complete). This command will auto-complete any previous migration until it reaches the last one in the pgroll folder. You would run this command from your CI to ensure the DB has all the migrations in place, with the last one started, so previous version is still available for previous application instances.
  • We introduce a pgroll latest_version_schema (better name pending?) to ease the task of configuring clients. This command will output the schema to be used in the search_path to access the latest version (using local files in pgroll folder as the source of truth).

I believe these 2 new commands would ease local development and continuous deployment.

Please let us know what you think, we are really interested in your feedback!:

exekias avatar Nov 13 '24 12:11 exekias

@exekias

Those sound like great improvements. I believe they would relieve the issues I'm running into.

Thanks for getting back!

Do you want me to leave the issue open? Looks like it's assigned and part of a milestone

ericwooley avatar Nov 13 '24 15:11 ericwooley

Yes, let's leave this issue open to add the migrate and latest_version_schema (or whatever we end up calling them) commands.

andrew-farries avatar Nov 13 '24 15:11 andrew-farries

Hey guys, quick question on this for @andrew-farries and @exekias. I want to integrate pgroll into my project, migrate and latest look almost perfect for my use case but I just want to confirm some behaviour.

Given the migration sequence of m0 -> m1 -> m2 -> m3 and my running application is currently at m1: when running pgroll migrate (without --complete), will it be the initial schema (m1) and the final schema (m3) remaining or will it be as specified in your initial response under bullet 3: "This command will auto-complete any previous migration until it reaches the last one in the pgroll folder" meaning m2 and m3 available?

If the latter, could I suggest a change in behaviour to the former if possible to keeping m1 and m3 available. I believe this would follow the aims of the project better in allowing apps to continue running while rolling out updates to migrations.

wtaylor avatar Dec 07 '24 16:12 wtaylor

Hi @wtaylor 👋

Given the migration sequence of m0 -> m1 -> m2 -> m3 and my running application is currently at m1: when running pgroll migrate (without --complete), will it be the initial schema (m1) and the final schema (m3) remaining or will it be as specified in your initial response under bullet 3: "This command will auto-complete any previous migration until it reaches the last one in the pgroll folder" meaning m2 and m3 available?

In this case it will be m2 and m3 that are available.

If the latter, could I suggest a change in behaviour to the former if possible to keeping m1 and m3 available. I believe this would follow the aims of the project better in allowing apps to continue running while rolling out updates to migrations.

pgroll currently requires that the previous migration is completed before the next migration can be started. In our experience this reflected how engineering teams most commonly apply migrations, but we've heard enough feedback since our initial release to suggest that allowing multiple migrations to be active simultaneously is desirable behaviour for pgroll.

Multiple active migrations would allow us to accommodate the scenario you describe with m1 and m3 (and possibly m2 aswell) being available after a migrate. https://github.com/xataio/pgroll/issues/249 is the issue that tracks this.

andrew-farries avatar Dec 12 '24 07:12 andrew-farries

@ericwooley we recently released v0.8 of pgroll which includes migrate and latest commands.

It would good to hear how far these additions go towards solving your problems integrating pgroll into your projects.

andrew-farries avatar Dec 12 '24 07:12 andrew-farries

@andrew-farries Tried out 0.8 last night, great set of QOL features!

I'll gist it when I'm done, but I'm putting together a small python wrapper script to get the behaviour I'm looking for. The quick version is:

  1. Query current schema version
  2. Find unapplied migrations by naively listing the migrations folder alphanumerically
  3. Combine unapplied migrations into a single "compound" migration file by concatenating the operations together
  4. pgroll start compound_migration.json

I couldn't find anything in the docs on operations specifically, am I right in assuming they're applied in sequential order?

wtaylor avatar Dec 12 '24 10:12 wtaylor

Yes, the operations in a migration are applied in the order in which they appear in the migration.

Squashing multiple migrations into a single migration like this would be useful behaviour to build into pgroll itself.

In order for this to work reliably though we first need to address https://github.com/xataio/pgroll/issues/239, which limits how useful multi-operation migrations are in practice.

andrew-farries avatar Dec 12 '24 11:12 andrew-farries

Yup, ran into #239 pretty quickly! :smiling_face_with_tear:

Anyhow, I created a wrapper script, I'm awful at Python but it seems to work if this is useful to anyone else: Gist

The purpose of the script is to be deployed before the app code then:

  1. Run pgroll init only if the db is uninitialised
  2. Apply migrations: a. If no previous migrations have been applied (post init), use pgroll migrate b. If a single migration is outstanding, apply that with pgroll start c. If multiple migrations are outstanding, create a single "compound" migration combining all the operations of all outstanding migrations.

The script is just lacking a --complete flag for post rollout of the app, I'll add that in the near future. My use case is to run it as a K8s job before and after the rollout of a new version of the app code.

The script will get a lot more useful when #239 get's a bit more finished. On the other hand, if the joining like behaviour of "keep the previous and latest schema versions available" get's added into pgroll migrate then this script will be pretty much obsolete

wtaylor avatar Dec 18 '24 22:12 wtaylor

We are looking forward to integrate Pgroll but we are blocked on this issue. We have multiple developers working independently on different features, each one creating migrations. It is really important for us to understand if we are planning to add the fix in near future.

Can you guys let us know the timeline you are targeting for fixing this issue in your v1 milestone? @andrew-farries

akashwar avatar Mar 10 '25 13:03 akashwar

what are you blocked on specifically @akashwar?

Are the pgroll migrate and pgroll latest commands added and referenced in this issue enough or do you still run into difficulties related to the issues discussed here?

andrew-farries avatar Mar 27 '25 07:03 andrew-farries

@andrew-farries We are blocked on the support for multiple developers working independently on different features. However, the added commands and suggestions in the comments above should serve the use case.

akashwar avatar Apr 10 '25 12:04 akashwar

I know it's been a while, but life has been crazy. I'll get back to trying this out on my next project.

In the meantime, in my own custom tooling, I've been experimenting with a few ideas, and I thought it might give you guys something to think about regarding tooling. I'm not convinced they are great ideas though:

Migration Generator

  1. start a pg test container
  2. Run all migrations against it
  3. Compare schema against some existing db
  4. Generate migration and dump into migrations script.

To me, this would be a huge win, because I can use GUI tools or dbeaver or whatever to muck with a local dev database, then generate a migration, and modify it as needed.

Verify Mode

The second idea, also has to do with test containers, where I can run in "verify mode". Essentially it would look something like this: pgroll migrate --verify which would:

  1. connect to the db I'm about to run migrations on.
  2. perform a dump, and restore to a test container
  3. run the migrations on the test container
  4. run a migrations verification script on the test container
  5. Perform the migrations on the real db.

The migrations would probably look something like this: 0001-migration-name-up.sql 0001-migration-name-down.sql 0001-migration-name-verify.sql

where the verify is what would be run to verify there is no data loss or anything detrimental.


I'm only bringing up these ideas because it's something I would love to see in a db migration tool, where I can deploy with extreme confidence. I'm not sure if it belongs in this tool or a meta tool of some kind, but I wanted to throw the ideas out there for ya'll.

ericwooley avatar Apr 22 '25 14:04 ericwooley

Hello @ericwooley, Thank you for these ideas, much appreciated!

The workflow you described for pgroll migrate --verify is very much in line with what we plan to offer as part of the new Xata platform. We’re introducing a migration request flow where users can create short-lived branches to apply schema changes to a throwaway environment and verify (dry-run) their changes. Personally, I think having this as part of a broader platform makes a lot of sense. That said, we’re also considering how the insights from this experience could feed back into pgroll as a standalone tool.

I hope this helps clarify our thinking a bit and thank you again for taking the time to share your ideas!

gulcin avatar May 04 '25 08:05 gulcin