allow specifying HTTP Basic auth parameters in config file (or netrc) for dvc import
I'd like to place dvc references in my git repo to data hosted on e.g. huggingface. But the models are private. If I do:
dvc import https://huggingface.co/spaces/foo/bar mymodel.bin
I get prompted for a username/password. I'd like to avoid this username/password prompt by being able to specify the username/password in a local config file. I cannot figure out how to do this with dvc, and I suspect it's not possible (but please tell me if it is!). What I'd want to work is either in .dvc/config (well, .dvc/config.local, but any of them):
['url "https://huggingface.co/"']
user = foo
password = bar
alternatively, look for any remote that has the same URL configured as what's passed to dvc import; so:
['remote "hf"']
url = https://huggingface.co/
and in config.local:
['remote "hf"']
user = foo
password = bar
Another alternate option -- use a ~/.netrc file like requests or curl does. aiohttp doesn't have native support for netrc, but the file format is trivial.
Hey @vvuk , I think there are a few options:
- Use
globalorsystemconfig. E.g. for thedvc remote modifycall here - You can also pass config directly to the
dvc importcommand - https://dvc.org/doc/command-reference/import#--remote-config
Let me know if something of this works for you.
Heya, thanks for the quick reply!
Use global or system config. E.g. for the dvc remote modify call here
Hmm, can you give me an example here? I've got a remote configured -- it doesn't seem to matter whether it's global or project or local.
You can also pass config directly to the dvc import command - https://dvc.org/doc/command-reference/import#--remote-config
Hmm -- this one still requires command line work, but I'm actually not sure how remotes come into play at all right now.
Here's a more specific example/set of commands with a public gated model. I created a new git/dvc repo (git init ; dvc init):
dvc import https://huggingface.co/google/gemma-2b README.md
this prompts for username/password.
I can create a remote for huggingface (note: doesn't matter if I use global/system/project/local, same result):
dvc remote add --global hf https://huggingface.co/
dvc remote modify --global hf user ...
dvc remote modify --global hf password hf_abc....
but doing the same dvc import above still prompts for username password. Same result with dvc import --remote hf https://huggingface.co/google/gemma-2b README.md (which I'd expect, since I believe --remote just sets where the data would be pushed to with a dvc push).
I don't think the remote is being considered at all... the only way it could be is if was matched by url prefix, and I don't think that's happening?
The concrete use case is I've got a bunch of private models on huggingface that I'd like to reference in my repo. But I'd like each of my developers to use their own huggingface credentials to pull them down when doing dvc pull. The only thing that works (as expected) is explicitly providing the user/pass in the URL:
dvc import https://user@pass:huggingface.co/google/gemma-2b README.md
but then my username/password is stored in the .dvc file.
Ah, I see. I think I misunderstood the question. I see now that it's user / password for the HF itself.
Since it's pretty much about Git protocol here, you should probably config Git to be able to do git clone https://huggingface.co/spaces/foo/. I think git supports quite a few ways to manage credentials.
+1 for this
.netrc support is a blocker for us adopting DVC
.netrc is supported by aiohttp since 3.9.0 which was released in November of 2023.
See https://github.com/aio-libs/aiohttp/pull/7131.
I have just tested this with dvc, and it works fine (we set trust_env=True which is what is needed for aiohttp to parse netrc).
Thanks for pointing out the trust_env=True option.
I tried to set it but get an error:
dvc config --global core.trust_env true
ERROR: configuration error - config file error: extra keys not allowed @ data['core']['trust_env']
What am I doing wrong?
Thanks for pointing out the
trust_env=Trueoption.I tried to set it but get an error:
dvc config --global core.trust_env true ERROR: configuration error - config file error: extra keys not allowed @ data['core']['trust_env']
What am I doing wrong?
You don't have to set anything. trust_env is a aiohttp.Client option, and is set to default by dvc. There is no config option.
Make sure that the ~/.netrc file exists, has permissions to read and the domain is set in netrc.
If it's windows, the file is $USERPROFILE/_netrc used.
Thanks for pointing out the
trust_env=Trueoption. I tried to set it but get an error: dvc config --global core.trust_env true ERROR: configuration error - config file error: extra keys not allowed @ data['core']['trust_env'] What am I doing wrong?You don't have to set anything.
trust_envis aaiohttp.Clientoption, and is set to default by dvc. There is no config option.Make sure that the
~/.netrcfile exists, has permissions to read and the domain is set in netrc.If it's windows, the file is
$USERPROFILE/_netrcused.
Thanks for you answer. The file ~/.netrc exists and has the following content:
cat ~/.netrc
machine PRIVATE_GITLAB_SERVER_DOMAIN
login __token__
password TOKEN
And this is the config (data.dvc) in my local repo:
md5: b2880ec09b9967374c502cb377860abb
frozen: true
deps:
- path: data
repo:
url: https://PRIVATE_GITLAB_SERVER_DOMAIN/some/dvc/repo.git
rev_lock: 11b4678a3443e5977a963118bc73646a8bf1dc2b
remote: some_remote
outs:
- md5: ad68b2c2ae8e43978564d11b67fd6b82.dir
size: 53569389403
nfiles: 310210
hash: md5
path: data
I only replaced the machine value and password value in ~/.netrc and the url value and remote value in data.dvc.
I always get asked for username and password unfortunately.
@luh-t-to, what config options do you have set in .dvc/config? If you want to use netrc, you'll have to unset relevant remote configs like ask_password, auth, user, password, custom_auth_header, etc.
@luh-t-to, what config options do you have set in
.dvc/config? If you want to usenetrc, you'll have to unset relevant remote configs likeask_password,auth,user,password,custom_auth_header, etc.
Actually, the .dvc/config file is empty.
Sorry, looks like I mixed up import and import-url. netrc works for importing from a http(s) url or fetching from a http(s) remote.
That does not work for import when we clone from a Git repository. I think here git-credentials is a much better solution than using .netrc, which dvc already supports.
You can enable it using:
git config --global credential.helper store
After that is set, it will prompt you for password the first time, and dvc will save the credentials to the store. Subsequent runs will reuse the credentials provided.
See:
- https://stackoverflow.com/questions/35942754/how-can-i-save-username-and-password-in-git.
- https://git-scm.com/docs/gitcredentials
You can also use other credential managers, including your os keychain.