prefect icon indicating copy to clipboard operation
prefect copied to clipboard

Auto version flows stored in local file system

Open joshmeek opened this issue 5 years ago • 3 comments

Current behavior

Currently if I store flows locally either through the Local storage or manually with flow.save there will only exist a single flow based on the flow's name.

e.g.

Flow('test').save()

# change test flow code...

Flow('test').save()

# Now the original `test` flow that was saved is gone

This is undesirable for local development and especially when using the Local storage because on each flow.register the flows will be versioned in Cloud but if I unarchived an old flow and went back to run it the local pickled flow will be overwritten by the newest verison.

Proposed behavior

Start versioning locally saved flows. This is where the discussion comes into play because there are two different stories here:

  1. Manual flow saves w/ flow.save and flow.load (Cloud-less)
  2. Flows registered w/ Cloud using the Local storage

For scenario 2: flows could be stored w/ some combination of tenant id, project, cloud version, etc...

'/.prefect/flows/my-flow.prefect'

could become something like

'/.prefect/flows/tenant-id/project/my-flow.prefect'

For scenario 1: ideally we would implement some sort of local auto-versioning however that could get confusing if the name saved does not directly match the slugified flow name. (makes it harder to retrieve without knowing which version/uuid/etc. it was stored with)

'/.prefect/flows/my-flow.prefect'

could become something like

'/.prefect/flows/my-flow-XXXXX.prefect'

I would also like to point out that we currently do something like this for our cloud storage based flows with a slugified datetime corresponding to the time of upload.

e.g. a flow named etl-s3 would be saved in an S3 bucket under

s3://my-bucket-of-flows/etl-s3/2019-12-30t19-43-55-672201-00-00

joshmeek avatar Jan 13 '20 19:01 joshmeek

Discussing together with the Core team, we observed that

  • the s3 storage slugifies with a timestamp, which solves this from that perspective
  • docker storage does a timestamp for the tag as well
  • local does not do this by default

So it seems that we should make local consistent with the other storage formats. One downside discussed is that it becomes more difficult to retrieve it manually from storage -- and we do see that people when debugging want to easily get a hold of their flows serialized to local storage. We do not observe too much people actually needing to see their old serializations of the same flow.

Therefore we are considering

  1. optional toggle to add timestamp (better for the automation use case)
  2. flow.save() -> if there is already a prefect file with that flow name, copy that file and move it with the current timestamp of archiving and then save the new one on top of it
    • doesn't help if you have a flow with the same name in two projects, but that's already broken now :)

lauralorenz avatar Apr 09 '20 18:04 lauralorenz

Regarding manual retrieval from storage -- we could just always add the timestamp to bring this in line with other storage methods e.g.

flow-name-{timestamp}.prefect

then since it's a local file system and we have the power to do sym links we can do

flow-name-latest.prefect

which points to the timestamped flow. We don't need to track latest from the backend -- it'd just be for this manual inspection case.

zanieb avatar Dec 29 '20 16:12 zanieb

@madkinsz Yeah I like that idea. I think just doing the timestamp in the name is enough to solve most use cases here 👍

joshmeek avatar Dec 29 '20 18:12 joshmeek