huggingface_hub icon indicating copy to clipboard operation
huggingface_hub copied to clipboard

generalizing the git status file retrieval functions

Open stas00 opened this issue 4 years ago • 1 comments

feature request: getting various git status entries.

I needed to get other similar functions besides:

https://github.com/huggingface/huggingface_hub/blob/71beeb17ef07b82b6501cfd8a30012fc934b787b/src/huggingface_hub/repository.py#L405

So I generalized the code: with get_git_files_by_status, which returns all files reported by git status -s already grouped for consumption.

git_status_lookup = {
    "?": "untracked",
    "M": "modified",
    "A": "added",
    "D": "deleted",
    "R": "renamed",
    "C": "copied",
    "U": "updated_unmerged",
}

So if you think it'd be a useful addition, you can adopt it from here: https://github.com/bigscience-workshop/bigscience/blob/34506550665e8fd415939036f15e88c6b218c9cc/tools/hub-sync.py#L67-L125 just need to change the functions to add self


Also I had to figure out how to setup the program to commit as not me, which the other main part of that script, which you may want to adopt. Which was quite tricky to figure out. Now it works.

the auth as someone else gets fed from a config generated by: https://github.com/bigscience-workshop/bigscience/blob/master/tools/hub-auth.py


You can see this script auto-committing to various repos on the hub in real time via a slurm job, e.g.: https://huggingface.co/bigscience/tr1-13B-tensorboard/tensorboard

Thanks.

stas00 avatar Aug 06 '21 03:08 stas00

Thanks for sharing @stas00! Will take a look at your get_git_files_by_status method.

Interesting to see how you managed hub-auth.py, will take a look at what we can do to make that easier.

LysandreJik avatar Aug 06 '21 06:08 LysandreJik