procrastinate icon indicating copy to clipboard operation
procrastinate copied to clipboard

Implement a simple migration command

Open stratosgear opened this issue 1 year ago β€’ 16 comments
trafficstars

I tried to understand how the project handles schema migrations but after reading all the documentation pages regarding migrations and browsing through related existing open/closed issues, I have not found a concrete explanation of how it works.... :(

In my use case I have introduced procrastinate in an existing code base (non Django based).

I have executed: procrastinate -a my.src.app schema --apply that correctly applied the procrastinate structures required. Procrastinate seems to be working fine.

My concern now is to how can I remain current with all potential migrations that might be coming along in the future.

I was hoping that I would be able to keep executing procrastinate -a my.src.app schema --apply everytime I update my python dependencies and during the project startup, and hopefully automatically catch any potential future migrations required, but I am not sure if this will actually work.

Am I right in thinking that I have to somehow have to adopt any new procrastinate migration scripts as my own and find a way to apply them myself with my existing migration methodology (basically using Alembic)?

Because this is something really fragile and will require a lot of coordinated work to implement, test and maintain. This will increase the friction of adopting procrastinate too much! :(

Am I missing an obvious solution?

stratosgear avatar Apr 17 '24 08:04 stratosgear

I think you're right, except if you use Django.

When we worked on the migration system, we wanted to keep it a bit minimal to avoid tying to a specific system, since there were multiple existing system and choosing one would likely have made it very difficult for people using another one. Since migration systems usually come with their own way of tracking migrations it would be complicated to add our custom way of tracking what has been run or not.

The one may thing we commit to doing is that each release lists its migrations. Each migration script is written so as to be runnable as-is, if you need to modify it it's probably a bug, and most migration systems accept migrations where you give the SQL code to run directly, so they should be compatible. Also, we've made a dedicated Django integration that lists procrastinate migrations as Django migrations.

But then you're right, nothing yet has been made to ease that part of the lib.

We could do the same with Alembic as we do with Django, I guess, it would probably cover most of what people would be using. I've never played with alembic so I'd love if someone would like to have a look.

ewjoachim avatar Apr 17 '24 17:04 ewjoachim

Although I have not done the full analysis, why Procrastinate cannot handle it's own migrations?

For example Alembic, keeps a table where it notes what was the last migration script that was applied.

This process, or something similar, could be maintained internally from Procrastinate, and NOT be the responsibility of the user to take care of an external dependency. Procrastinate already provides a schema manager, in the form of the original cli that applies the schema, so why not extend it a bit and whenever it runs, it checks for any missing migrations, and apply them, otherwise gracefully exit mentioning that everything is up to date.

I mean maybe I oversimplify things, but it seems Procrastinate already deals with much heavier concepts here, auto-handling the migrations should be peanuts! :)

stratosgear avatar Apr 17 '24 18:04 stratosgear

We could, but... I don't like doing in one lib things that [I feel] might get quite complicated and is something complex enough that I would imagine there would be dedicated other libs to do it right.

I'm not saying we can't do it here, but if we did, we'd need:

  • Deciding on a storage mechanism. It could be just the comments on the main table, or a dedicated table.
  • The code to read and write the version number
  • CLI args to:
    • Migrate to the latest version
    • Migrate to a specific version (with a check that we're not going backwards ? )
    • Force-write a migration number, in case you're going to apply one migration yourself (e.g. if you want to modify it)
  • Associated tests & docs
  • Optionally a way to disable those migrations with Django, otherwise it's a footgun (if you type ./manage.py procrastinate migrate instead of ./manage.py migrate, you could get in trouble)

It's perfectly doable. But it's not trivial. I'm not sure most people want multiple migration systems to cooperate (potentially on the same database) and I'm pretty sure sys admins are not going to be happy when they need to run 2 different migration commands upon deployment. When possible, I really think that if you already have a migration system for your app, you'd rather have Procrastinate use that. At least, I'd want that.

ewjoachim avatar Apr 17 '24 21:04 ewjoachim

Well, I am sorry to say but deciding to not deal with any of these, you are pushing the burden to someone else, not familiar with your codebase, to take on additional responsibilities in order to maintain it. We do not feel it is appropriate to separately deal with each third party utility/extension/plugin that considers it's too much work maintaining its own execution environment. And to be totally clear, by no means you are obligated to do so. It's just that Procrastinate does not fit our needs, in which case, no hard feelings! :)

I think the issue can be closed, since it has verified my initial concern!

Thanks!

stratosgear avatar Apr 18 '24 07:04 stratosgear

Sorry :) Maybe I'll reconsider at some point. I understand your point, but this is a one-person volunteer lib until more people step in, and not my only open-source commitment, so I need to be realistic on what I can/want to work on.

I think it's worth keeping it open if other people want to chime-in. Your point is valid, and even if you chose another lib, it's always worth listening to feedback.

(If someone is interested to contribute, please discuss it first)

ewjoachim avatar Apr 19 '24 22:04 ewjoachim

I wonder what possible ways to improve the situation here would be. Maybe an additional table where every applied migration is captured. Then, in the first step, at least a developer using Procrastinate could check (with some command) which migrations were applied. The schema.sql file would be unnecessary then, as the migrations have to be applied (in the correct order) by some script initially. Then, in the next step, a script that will automatically apply later migrations when updating Procrastinate. But somehow, this should only affect non Django users.

EDIT: @ewjoachim I just read you had the same ideas.

Another option I can think of (I mentioned it somewhere else) is to always use a custom migration management and only apply those Django specific model migrations in Django. Then we could hook into the Django migration system using a signal (pre_migrate or post_migrate) and execute our own migration system. Not sure how backward-compatible this would be.

medihack avatar Jun 28 '24 20:06 medihack

I wonder how much of the community doesn't use either Django nor Alembic. Would it be acceptable to provide Alembic migrations alongside with Django and it would be enough for the vast majority of users ?

Otherwise: maybe we could integrate a standalone migration system, such as alembic which is tied to sqlalchemy but could be used independently, or yoyo or any other stadnalone migration system, within procrastinate as an optional dependency.

(To be super extra duper clear: I used to be the maintainer of Septentrion (yet another migration tool) and what I've learned is that it's enough of a complex thing to do to deserve its own lib and not be something we want to do in our own codebase.)

I'm perfectly ok revisiting the decision of letting user deal with it, but I think I really don't want to maintain our own solution.

  • Choose a good migration manager
  • Integrate it (with a new contrib folder) with the CLI procrastinate schema migrate
  • Case closed

ewjoachim avatar Jun 28 '24 20:06 ewjoachim

In our own tests, we use migra. As you can see I have to do all sort of shenanigans when importing it because it seems unmaintained, and also based on schemainspect which seems equally unmaintained. It's the opportunity to remove the dep.

ewjoachim avatar Jun 28 '24 20:06 ewjoachim

Yes, this makes sense. And I can least estimate how many non Django users are using Procrastinate. As I am a Django user myself the priority regarding this issue is not very high, but maybe it's still good to evolve a plan that somebody else can easily hop in to improve the situation (otherwise it looks more like a better not touch issue πŸ˜‰).

medihack avatar Jun 28 '24 20:06 medihack

non-django user here (though I am using alembic)!

I'd very much value a way to avoid having to build my own migration management tool in order to use procrastinate safely in a CI / CD based production environment. That said: Alembic support would absolutely suit my needs. It seems to me that this would provide a solution for non-django users.

If there is maintainer comfort with this direction, I'd be glad to take a stab at implementing support for this, but would value any opinions on approach!

slifty avatar Jul 09 '24 17:07 slifty

Nice :)

I'm going to push my luck: would you be interested in developing it? Of course, we'll do our best to support you!

ewjoachim avatar Jul 09 '24 17:07 ewjoachim

Yes! I expect the best approach would be a draft PR that lays out an initial implementation that you can give feedback to.

Stay tuned...

slifty avatar Jul 09 '24 18:07 slifty

You'll probably be interested to look how Django migrations are done.

3 steps:

  • A custom sql migration class to ease using an official procrastinate migrations
  • the migrations themselves written manually
  • a test to check we didn't forget one migration

ewjoachim avatar Jul 10 '24 03:07 ewjoachim

After reviewing the ticket, I confirm the greenlight to a simple in-house migration system. Let's see how it goes :)

ewjoachim avatar Dec 30 '24 15:12 ewjoachim

That's great -- also I wanted to chime into apologize for failing to open my PR earlier this year, team priorities shifted around.

slifty avatar Dec 30 '24 15:12 slifty

Don't worry, we all do what we can with limited resources. Thanks for having considered it, and the door stays open if you want to revisit this.

ewjoachim avatar Dec 30 '24 16:12 ewjoachim

Alembic uses upgrade() and downgrade(), I don't see that in the migration files here. Btw, was there progress with Alembic meanwhile?

We’re using Procrastinate in production and one workaround we considered is completely deleting all Procrastinate-related stuff, updating, and re-running the entire migration, but this does not really work well for us.

πŸ’‘ So I tried to solve the problem by myself and came up with the following solution, which uses as many internal functions as possible:

  1. get the former version (this is a problem, and we might have to add a version table or an odd version file)
  2. get the new version via procrastinate.__version__
  3. read all migrations in order using procrastinate_app.schema_manager.get_migrations_path()
  4. only return migrations that are applicable ( > old_version and <= new_version)
  5. execute every migration file via procrastinate_app.connector.execute_query_async(query=...)
  6. update the version (in the version table we most likely need)

I can understand why you don't want this as a feature. Migrations can break, leaving the database in an uncertain state, people could want to migrate to older versions, which needs downgrade queries, and so on.

But for other people this might ease the way so a solution they can build themselves to continue using Procrastinate in more taxing environments. Or we could integrate something like above as an experimental forward_migration feature πŸ€·β€β™‚οΈ

If someone needs, I can share the working code I already have. And please let me know if I missed something crucial πŸ‘

riotsnotdiets avatar Nov 04 '25 18:11 riotsnotdiets

Unfortunately, we only have upgrade migrations. But as far as I know, Alembic downgrade migrations are optional. It would be nice to support Alembic directly (as we do with the Django migration system). I gave it a quick shot with Claude Code, but I don't really have the time to try it out or improve it right now.

medihack avatar Nov 09 '25 12:11 medihack

get the former version (this is a problem, and we might have to add a version table or an odd version file)

Yes. A file is not a good idea, there's no guarantee that the filesystem is kept. If you're running on cloud machines, you can't leave data around on disk and expect to find it back later.

It has to be written in the DB, so either a table, or a comment on a table. Of course, a table will be much easier to deal with. I think it's the only solution that makes sense.

ewjoachim avatar Nov 10 '25 13:11 ewjoachim

@medihack

I gave it a quick shot with Claude Code, but I don't really have the time to try it out or improve it right now.

I tried a similar approach a while ago, but I declined it after I found several errors in the generated code. I’d rather consider creating an adapter to make Alembic run the SQL files provided by procrastinate.

Melebius avatar Nov 10 '25 14:11 Melebius

Coming to use Procrastinate now on a second project, where the previous one was Django and things "just worked", to now see the migration situation 😱

I'm not even sure about what to do with the pre/post stuff. The infra always runs migrations first and, if successful, deploys the new services. What am I supposed to do with the post ones? Do I need to split every procrastinate into two deploys?

  • PR 1: Upgrade procrastinate + copy the pre migration to our migrations
  • PR 2: Copy the post migration to our migrations

?

Frankly would prefer Procrastinate just takes ownership of the procrastinate schema and does whatever it wants to do in there.

// Edit: I'm not sure what the policy is for semver wrt migrations, but it would be nice if, for example, migrations would require minor version upgrades (i.e. not patch) and then the quickstart quide would recommend installing with a sufficiently restrictive version selector (in this case x.y.*) to ensure migrations are never required if just updating the lock file.

// Edit: Oh and also, a random suggestion: IMO schema.sql should be autogenerated by applying the migrations, then doing pg_dump. It's the only way for the project to really be sure the migrations lead to the same outcome as the current schema (which then really servers only for review purposes, there can be a test that can fail if it's not up-to-date).

jakajancar avatar Nov 19 '25 05:11 jakajancar