[Core feature] Support cache overwrite flag at task level
Motivation: Why do you think this is important?
Currently, when tasks need to bypass cache and force re-execution, developers need to modify the cache_version parameter. However, this is an internal implementation detail that shouldn't be exposed to end users. Users should have a more intuitive way to control cache behavior at runtime.
Goal: What should the final outcome look like, ideally?
Proposal
Add support for a cache_overwrite or force_recompute flag at the task level that can be set during execution. This would allow:
- Task definition remains unchanged
- Users can control cache behavior at runtime
- Clear separation between developer concerns (
cache_version) and user control
Describe alternatives you've considered
-
Workflow-Level Cache Control
- Add cache control parameters at workflow level only
- Pros: Simpler API surface, clearer execution tracking
- Cons: Too coarse-grained, can't control individual tasks
-
Using Cache Version
- Current approach: Modify cache_version
- Pros: Already implemented
- Cons: Mixes developer and user concerns, requires code changes
Propose: Link/Inline OR Additional context
Example API could look like:
@task(cache=True, overwrite_cache=False) # Default behavior
def my_task():
pass
# At Runtime
wf.my_task(overwrite_cache=True) # Force recompute
Implementation (rough idea):
- Add one more field called
overwrite_cachefor Flyte Task Decorator - Add helper func:
IsTaskCacheOverwriteforcache.go - In
executor.go, check ifIsTaskCacheOverwrite, if yes, returnCatalogCacheStatus_CACHE_SKIPPED
Are you sure this issue hasn't been raised already?
- [X] Yes
Have you read the Code of Conduct?
- [X] Yes
the UX is harder than it looks. As you have to find the right task and replace it. the same task maybe repeated multiple times. If you want to simply overwrite the value - change the cache_version and it will achieve the same effect. This can be overriden in the workflow during the registration time. Today we do not have an easy way to pass it in during execution. Suggest a nice UX for a complex workflow
@kumare3 Thanks for reply.
Regarding the cache_version solution, it would work for adhoc/exprimental workflow code (tho it requires some level of understanding of how caching works in Flyte, I know it's easy but however in our scenario)
In our company, we built a set of reusable workflows and tasks (as ML Infra Team), our users (AI Verticals like Feed Ranking) will be use the wrapper like this
# Code ML Infra Team in `trainer.py` from repo_A
@task(cache=False, cache_version="v1")
def trainer_task()
pass
Code from User Team in workflow.py from repo_B
from repo_A.trainer import trainer_task
@workflow
def my_wf:
trainer_task().with_overrides(cache=True, cache_version="v2")
So workflow.py will be shared by a user team, and multiple developers will be working on the same workflow code.
Hence different developer may have trouble to coordinate on the cache_version.
If we provide the overwrite_cache for tasks, user can simply mark overwrite_cache=True to make sure this run will be writing to cache instead of coordinate the cache_version with his/her teammates.
So I would suggest support overwrite_cache in the task decorator and I would like to contribute if the idea is not bad to you.
Interesting- got it. Cc @eapolinario
"Hello 👋, this feature request has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 14 days. Thank you for your contribution and understanding! 🙏"
2.0 supports this