`datasets/downloads` cleanup tool
Feature request
Splitting off https://github.com/huggingface/huggingface_hub/issues/1997 - currently huggingface-cli delete-cache doesn't take care of cleaning datasets temp files
e.g. I discovered having millions of files under datasets/downloads cache, I had to do:
sudo find /data/huggingface/datasets/downloads -type f -mtime +3 -exec rm {} \+
sudo find /data/huggingface/datasets/downloads -type d -empty -delete
could the cleanup be integrated into huggingface-cli or a different tool provided to keep the folders tidy and not consume inodes and space
e.g. there were tens of thousands of .lock files - I don't know why they never get removed - lock files should be temporary for the duration of the operation requiring the lock and not remain after the operation finished, IMHO.
Also I think one should be able to nuke datasets/downloads w/o hurting the cache, but I think there are some datasets that rely on files extracted under this dir - or at least they did in the past - which is very difficult to manage since one has no idea what is safe to delete and what not.
Thank you
@Wauplin (requested to be tagged)