[Core feature] Allow users to control storage for caching
Motivation: Why do you think this is important?
I am using Flyte to process large datasets (couple GB) and I enabled caching for my tasks. However, I noticed my storage quickly filled up because old cache entries were not deleted.
Goal: What should the final outcome look like, ideally?
I propose the following features:
- Users should be able to configure a limit (at least a soft limit) to the amount of storage consumed by the cache. Cache entries should then be evicted using a least-recently-used policy (or maybe something clever - many small cache entries might be more important than one large?).
- The Flyte API (and CLI) should expose a way to clear the cache (and the cache for individual tasks, projects, ...)
Describe alternatives you've considered
AFAIK, there is currently no way to clear the cache.
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
- [x] Yes
Have you read the Code of Conduct?
- [x] Yes
Flyte does not currently support user-configurable storage limits or automatic eviction (e.g., LRU) for cached task outputs. There is no built-in mechanism to set a cache size limit or to automatically evict old cache entries based on usage or age. Cache cleanup must be handled manually or via external storage lifecycle policies, which can result in dangling pointers in Datacatalog and errors if Flyte tries to access missing data. There is also no official API or CLI command to clear the cache for individual tasks, projects, or globally, though there are open feature requests and RFCs discussing these needs and possible future implementations (Flyte issue #6450, Flyte issue #3305, RFC: cache eviction).
Currently, the only related configuration is a global max-cache-age in FlytePropeller, which limits how far back cache entries are considered, but it does not clean up database entries or support per-task TTLs. Manual deletion of cache entries from Datacatalog and blob storage is possible but not recommended due to risk of inconsistency. Would you like more detail on workarounds or the current state of cache management in Flyte?
Sources:
This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.
Mention @runllm to ask follow-up questions. See runllm.com for more!
Is this local or remote?
Remote. I am using minikube with flyte-binary. I don't know how local would behave but I am guessing it will be similar because local supports caching as well and the data has to be stored somewhere one way or another.