data icon indicating copy to clipboard operation
data copied to clipboard

Add cache invalidation to OnDiskCacheHolder

Open bilelomrani1 opened this issue 2 years ago • 1 comments

🚀 The feature

Add the ability to automatically invalidate a cached sub-graph when the remote files change after being cached locally.

Motivation, pitch

Say I have multiple files stored in a remote object storage. These files are fed into a datapipe using FSSpecFileLister, and cached locally using .on_disk_cache. I want to invalidate the cache and re-compute the datapipe when one or more remote files are changed, probably based on their hash.

Alternatives

No response

Additional context

This feature request originated from this conversation on the Pytorch forum.

bilelomrani1 avatar Mar 24 '23 19:03 bilelomrani1

Note that I believe you can currently use extra_check_fn within .on_disk_cache to re-compute the hash and flag any difference. But that will not automatically delete or re-download the files.

NivekT avatar Mar 24 '23 19:03 NivekT