generalizing the git status file retrieval functions
feature request: getting various git status entries.
I needed to get other similar functions besides:
https://github.com/huggingface/huggingface_hub/blob/71beeb17ef07b82b6501cfd8a30012fc934b787b/src/huggingface_hub/repository.py#L405
So I generalized the code: with get_git_files_by_status, which returns all files reported by git status -s already grouped for consumption.
git_status_lookup = {
"?": "untracked",
"M": "modified",
"A": "added",
"D": "deleted",
"R": "renamed",
"C": "copied",
"U": "updated_unmerged",
}
So if you think it'd be a useful addition, you can adopt it from here:
https://github.com/bigscience-workshop/bigscience/blob/34506550665e8fd415939036f15e88c6b218c9cc/tools/hub-sync.py#L67-L125
just need to change the functions to add self
Also I had to figure out how to setup the program to commit as not me, which the other main part of that script, which you may want to adopt. Which was quite tricky to figure out. Now it works.
the auth as someone else gets fed from a config generated by: https://github.com/bigscience-workshop/bigscience/blob/master/tools/hub-auth.py
You can see this script auto-committing to various repos on the hub in real time via a slurm job, e.g.: https://huggingface.co/bigscience/tr1-13B-tensorboard/tensorboard
Thanks.
Thanks for sharing @stas00! Will take a look at your get_git_files_by_status method.
Interesting to see how you managed hub-auth.py, will take a look at what we can do to make that easier.