flyte icon indicating copy to clipboard operation
flyte copied to clipboard

[Core feature] Allow users to control storage for caching

Open Mathis-Z opened this issue 3 months ago • 3 comments

Motivation: Why do you think this is important?

I am using Flyte to process large datasets (couple GB) and I enabled caching for my tasks. However, I noticed my storage quickly filled up because old cache entries were not deleted.

Goal: What should the final outcome look like, ideally?

I propose the following features:

  • Users should be able to configure a limit (at least a soft limit) to the amount of storage consumed by the cache. Cache entries should then be evicted using a least-recently-used policy (or maybe something clever - many small cache entries might be more important than one large?).
  • The Flyte API (and CLI) should expose a way to clear the cache (and the cache for individual tasks, projects, ...)

Describe alternatives you've considered

AFAIK, there is currently no way to clear the cache.

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

  • [x] Yes

Have you read the Code of Conduct?

  • [x] Yes

Mathis-Z avatar Sep 18 '25 12:09 Mathis-Z

Flyte does not currently support user-configurable storage limits or automatic eviction (e.g., LRU) for cached task outputs. There is no built-in mechanism to set a cache size limit or to automatically evict old cache entries based on usage or age. Cache cleanup must be handled manually or via external storage lifecycle policies, which can result in dangling pointers in Datacatalog and errors if Flyte tries to access missing data. There is also no official API or CLI command to clear the cache for individual tasks, projects, or globally, though there are open feature requests and RFCs discussing these needs and possible future implementations (Flyte issue #6450, Flyte issue #3305, RFC: cache eviction).

Currently, the only related configuration is a global max-cache-age in FlytePropeller, which limits how far back cache entries are considered, but it does not clean up database entries or support per-task TTLs. Manual deletion of cache entries from Datacatalog and blob storage is possible but not recommended due to risk of inconsistency. Would you like more detail on workarounds or the current state of cache management in Flyte?

Sources:


This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.

Mention @runllm to ask follow-up questions. See runllm.com for more!

runllm[bot] avatar Sep 18 '25 12:09 runllm[bot]

Is this local or remote?

kumare3 avatar Sep 18 '25 13:09 kumare3

Remote. I am using minikube with flyte-binary. I don't know how local would behave but I am guessing it will be similar because local supports caching as well and the data has to be stored somewhere one way or another.

Mathis-Z avatar Sep 18 '25 14:09 Mathis-Z