flyte
flyte copied to clipboard
[Core feature] Specifying TTL for flyte managed files
Motivation: Why do you think this is important?
Data that is passed from tasks in flyte workflows is typically in the form of a FlyteFile, which abstracts away how that data is stored on a remote object store, like s3. Regulatory requirements can often impose a lifetime on data, and there is no obvious way to specify the lifetime of such files that are managed by Flyte. By supporting a specification like a time to live (TTL) at the Flyte abstraction layer, it would be possible as a user to apply these types of expirations to objects in a granular manner (only expire data that needs to be expired) in a cloud-agnostic way and ultimately enable users to meet their regulatory requirements.
Goal: What should the final outcome look like, ideally?
Ideally, when constructing a FlyteFile or FlyteDirectory, the user could pass in an optional TTL or expiry timestamp. Once that object gets created by a flyte workflow, the flyte backend would be aware of it and via a periodic job clean up the object(s) once they have expired. A record of the expiration action, obtainable through flyte's apis or visible somewhere in the flyteconsole, would assist in auditability and compliance efforts. Past workflows that are either re-run or otherwise inspected should handle the fact that some object(s) may be expired.
Describe alternatives you've considered
-
Apply lifetime policies to objects in the flyte backend blob store. This would delete objects that are not actually needing to be deleted, so not optimal.
-
Apply lifetime policies to objects in the flyte backend object store in conjunction with some tag filtering whereby Flyte would still somehow know to tag certain objects but delegate the cleanup to the backend store. This is not as granular as the proposal and would need cloud-vendor specific implementations. One also needs to coordinate the bucket configuration with the Flyte configuration to get this to work.
In both cases, Flyte may be unaware of those objects being deleted by the configured backend object store and lead to a suboptimal user experience interacting with past workflows where those objects were passed / output.
Propose: Link/Inline OR Additional context
Suggested in slack thread
Are you sure this issue hasn't been raised already?
- [X] Yes
Have you read the Code of Conduct?
- [X] Yes