`huggingface-cli delete-cache --disable-tui` improvements
I tried huggingface-cli delete-cache --disable-tui for the first time. Great intention, very problematic usage when one had thousands of hub objects to cleanup. Once I understood its quirks I was able to hack around those problems.
- Confusing UX
huggingface-cli delete-cache --disable-tui
TUI is disabled. In order to select which revisions you want to delete, please edit
the following file using the text editor of your choice. Instructions for manual
editing are located at the beginning of the file. Edit the file, save it and confirm
to continue.
File to edit: /tmp/tmpundr7lky.txt
0 revisions selected counting for 0.0. Continue ? (y/N)
The doc says to hit y but the program exits if y is hit and all careful manual editing is lost and the user has to start from scratch (ouch!)
I suspect this is a bug or a problem in the workflow.
-
Please don't use a true temp file, use a file that won't get deleted and a user can re-use it should they hit the wrong button - see Issue 1 above as an example.
-
sorting to have
mainlast consistently would help. e.g. the first few entries had a consistent-main-last listing as in:
# Model mistralai/Mistral-7B-v0.1 (29.5G, used 19 hours ago)
5e9c98b96d071dce59368012254c55b0ec6f8658 # Refs: (detached) # modified 3 months ago
17688039ea7b7001c702797eed2ab7716a0cc3c2 # Refs: (detached) # modified 6 weeks ago
# 26bca36bde8333b5d7f72e9ed20ccda6a618af24 # Refs: main # modified 6 weeks ago
so I started to manually uncomment lines thinking main is always last, w/o paying close attention, but luckily I caught this was inconsistent as then I run into:
# Model mistralai/Mixtral-8x7B-v0.1 (93.4G, used 10 hours ago)
# 985aa055896a8f943d4a9f2572e6ea1341823841 # Refs: main # modified 5 weeks ago
58301445dc1378584211722b7ebf8743ec4e192b # Refs: (detached) # modified 5 weeks ago
and many other variations. For those who need to edit hundreds of these, it'd be great to have main first or last - probably actually first would be the easiest.
- give a user a way to delete all non-main revisions in one go
I tried to do it manually and it was super slow and I was concerned my edits will get lost again if I hit Y instead of the confusing N (see Issue 1)
at the end I resorted to this hack:
cp /tmp/tmpedbz00ox.txt cache.txt
perl -pi -e 's|^#(.*detached.*)|$1|' cache.txt
cat cache.txt >> /tmp/tmpundr7lky.txt
and hit N, Y, Y
so I wiped out hundreds of old revisions in a second w/o manual editing.
This is usually what users want - keep the main, get rid of old revs - would it be possible to create such option?
Thank you!
Thanks for the great feedback @stas00! I haven't looked into this for a long time but all suggestions seems to make sense. Will keep you updated when I start working on that :)
Super! The foundation is awesome, @Wauplin - just needs a bit of polish on top.
The other problem this tool isn't take care of is cleaning downloads/ - I checked and we had 2M of files there! I had to do:
sudo find /data/huggingface/datasets/downloads -type f -mtime +3 -exec rm {} \+
sudo find /data/huggingface/datasets/downloads -type d -empty -delete
I'm not sure if huggingface-cli could take care of datasets as well, since they come from the hub. Please let me know if I should open a separate issue, since it's related but not the same.
@stas00 datasets cache is another topic! At the moment datasets doesn't use the default cache shared between huggingface_hub, transformers, diffusers, etc. We will have to fix that first before providing a CLI to clean this cache too (cc @lhoestq for visibility).
Do you want me to open an Issue here or on the datasets repo?
Better for the datasets repo but please tag me. Thanks!
Done: https://github.com/huggingface/datasets/issues/6614