prefect
prefect copied to clipboard
Auto version flows stored in local file system
Current behavior
Currently if I store flows locally either through the Local
storage or manually with flow.save
there will only exist a single flow based on the flow's name.
e.g.
Flow('test').save()
# change test flow code...
Flow('test').save()
# Now the original `test` flow that was saved is gone
This is undesirable for local development and especially when using the Local
storage because on each flow.register
the flows will be versioned in Cloud but if I unarchived an old flow and went back to run it the local pickled flow will be overwritten by the newest verison.
Proposed behavior
Start versioning locally saved flows. This is where the discussion comes into play because there are two different stories here:
- Manual flow saves w/
flow.save
andflow.load
(Cloud-less) - Flows registered w/ Cloud using the
Local
storage
For scenario 2: flows could be stored w/ some combination of tenant id, project, cloud version, etc...
'/.prefect/flows/my-flow.prefect'
could become something like
'/.prefect/flows/tenant-id/project/my-flow.prefect'
For scenario 1: ideally we would implement some sort of local auto-versioning however that could get confusing if the name saved does not directly match the slugified flow name. (makes it harder to retrieve without knowing which version/uuid/etc. it was stored with)
'/.prefect/flows/my-flow.prefect'
could become something like
'/.prefect/flows/my-flow-XXXXX.prefect'
I would also like to point out that we currently do something like this for our cloud storage based flows with a slugified datetime corresponding to the time of upload.
e.g. a flow named etl-s3
would be saved in an S3 bucket under
s3://my-bucket-of-flows/etl-s3/2019-12-30t19-43-55-672201-00-00
Discussing together with the Core team, we observed that
- the s3 storage slugifies with a timestamp, which solves this from that perspective
- docker storage does a timestamp for the tag as well
- local does not do this by default
So it seems that we should make local consistent with the other storage formats. One downside discussed is that it becomes more difficult to retrieve it manually from storage -- and we do see that people when debugging want to easily get a hold of their flows serialized to local storage. We do not observe too much people actually needing to see their old serializations of the same flow.
Therefore we are considering
- optional toggle to add timestamp (better for the automation use case)
- flow.save() -> if there is already a prefect file with that flow name, copy that file and move it with the current timestamp of archiving and then save the new one on top of it
- doesn't help if you have a flow with the same name in two projects, but that's already broken now :)
Regarding manual retrieval from storage -- we could just always add the timestamp to bring this in line with other storage methods e.g.
flow-name-{timestamp}.prefect
then since it's a local file system and we have the power to do sym links we can do
flow-name-latest.prefect
which points to the timestamped flow. We don't need to track latest
from the backend -- it'd just be for this manual inspection case.
@madkinsz Yeah I like that idea. I think just doing the timestamp in the name is enough to solve most use cases here 👍