dvc icon indicating copy to clipboard operation
dvc copied to clipboard

allow specifying HTTP Basic auth parameters in config file (or netrc) for dvc import

Open vvuk opened this issue 1 year ago • 4 comments

I'd like to place dvc references in my git repo to data hosted on e.g. huggingface. But the models are private. If I do:

dvc import https://huggingface.co/spaces/foo/bar mymodel.bin

I get prompted for a username/password. I'd like to avoid this username/password prompt by being able to specify the username/password in a local config file. I cannot figure out how to do this with dvc, and I suspect it's not possible (but please tell me if it is!). What I'd want to work is either in .dvc/config (well, .dvc/config.local, but any of them):

['url "https://huggingface.co/"']
    user = foo
    password = bar

alternatively, look for any remote that has the same URL configured as what's passed to dvc import; so:

['remote "hf"']
    url = https://huggingface.co/

and in config.local:

['remote "hf"']
  user = foo
  password = bar

Another alternate option -- use a ~/.netrc file like requests or curl does. aiohttp doesn't have native support for netrc, but the file format is trivial.

vvuk avatar Nov 15 '24 20:11 vvuk

Hey @vvuk , I think there are a few options:

  1. Use global or system config. E.g. for the dvc remote modify call here
  2. You can also pass config directly to the dvc import command - https://dvc.org/doc/command-reference/import#--remote-config

Let me know if something of this works for you.

shcheklein avatar Nov 16 '24 00:11 shcheklein

Heya, thanks for the quick reply!

Use global or system config. E.g. for the dvc remote modify call here

Hmm, can you give me an example here? I've got a remote configured -- it doesn't seem to matter whether it's global or project or local.

You can also pass config directly to the dvc import command - https://dvc.org/doc/command-reference/import#--remote-config

Hmm -- this one still requires command line work, but I'm actually not sure how remotes come into play at all right now.

Here's a more specific example/set of commands with a public gated model. I created a new git/dvc repo (git init ; dvc init):

dvc import https://huggingface.co/google/gemma-2b README.md

this prompts for username/password.

I can create a remote for huggingface (note: doesn't matter if I use global/system/project/local, same result):

dvc remote add --global hf https://huggingface.co/
dvc remote modify --global hf user ...
dvc remote modify --global hf password hf_abc....

but doing the same dvc import above still prompts for username password. Same result with dvc import --remote hf https://huggingface.co/google/gemma-2b README.md (which I'd expect, since I believe --remote just sets where the data would be pushed to with a dvc push).

I don't think the remote is being considered at all... the only way it could be is if was matched by url prefix, and I don't think that's happening?

The concrete use case is I've got a bunch of private models on huggingface that I'd like to reference in my repo. But I'd like each of my developers to use their own huggingface credentials to pull them down when doing dvc pull. The only thing that works (as expected) is explicitly providing the user/pass in the URL:

dvc import https://user@pass:huggingface.co/google/gemma-2b README.md

but then my username/password is stored in the .dvc file.

vvuk avatar Nov 16 '24 00:11 vvuk

Ah, I see. I think I misunderstood the question. I see now that it's user / password for the HF itself.

Since it's pretty much about Git protocol here, you should probably config Git to be able to do git clone https://huggingface.co/spaces/foo/. I think git supports quite a few ways to manage credentials.

shcheklein avatar Nov 16 '24 01:11 shcheklein

+1 for this

.netrc support is a blocker for us adopting DVC

nikita240 avatar May 20 '25 20:05 nikita240

.netrc is supported by aiohttp since 3.9.0 which was released in November of 2023.

See https://github.com/aio-libs/aiohttp/pull/7131.

I have just tested this with dvc, and it works fine (we set trust_env=True which is what is needed for aiohttp to parse netrc).

skshetry avatar Aug 08 '25 13:08 skshetry

Thanks for pointing out the trust_env=True option.

I tried to set it but get an error:

dvc config --global core.trust_env true
ERROR: configuration error - config file error: extra keys not allowed @ data['core']['trust_env']

What am I doing wrong?

luh-t-to avatar Aug 08 '25 14:08 luh-t-to

Thanks for pointing out the trust_env=True option.

I tried to set it but get an error:

dvc config --global core.trust_env true ERROR: configuration error - config file error: extra keys not allowed @ data['core']['trust_env']

What am I doing wrong?

You don't have to set anything. trust_env is a aiohttp.Client option, and is set to default by dvc. There is no config option.

Make sure that the ~/.netrc file exists, has permissions to read and the domain is set in netrc.

If it's windows, the file is $USERPROFILE/_netrc used.

skshetry avatar Aug 08 '25 14:08 skshetry

Thanks for pointing out the trust_env=True option. I tried to set it but get an error: dvc config --global core.trust_env true ERROR: configuration error - config file error: extra keys not allowed @ data['core']['trust_env'] What am I doing wrong?

You don't have to set anything. trust_env is a aiohttp.Client option, and is set to default by dvc. There is no config option.

Make sure that the ~/.netrc file exists, has permissions to read and the domain is set in netrc.

If it's windows, the file is $USERPROFILE/_netrc used.

Thanks for you answer. The file ~/.netrc exists and has the following content:

cat ~/.netrc 
machine PRIVATE_GITLAB_SERVER_DOMAIN
login __token__
password TOKEN

And this is the config (data.dvc) in my local repo:

md5: b2880ec09b9967374c502cb377860abb
frozen: true
deps:
- path: data
  repo:
    url: https://PRIVATE_GITLAB_SERVER_DOMAIN/some/dvc/repo.git
    rev_lock: 11b4678a3443e5977a963118bc73646a8bf1dc2b
    remote: some_remote
outs:
- md5: ad68b2c2ae8e43978564d11b67fd6b82.dir
  size: 53569389403
  nfiles: 310210
  hash: md5
  path: data

I only replaced the machine value and password value in ~/.netrc and the url value and remote value in data.dvc.

I always get asked for username and password unfortunately.

luh-t-to avatar Aug 08 '25 14:08 luh-t-to

@luh-t-to, what config options do you have set in .dvc/config? If you want to use netrc, you'll have to unset relevant remote configs like ask_password, auth, user, password, custom_auth_header, etc.

skshetry avatar Aug 08 '25 14:08 skshetry

@luh-t-to, what config options do you have set in .dvc/config? If you want to use netrc, you'll have to unset relevant remote configs like ask_password, auth, user, password, custom_auth_header, etc.

Actually, the .dvc/config file is empty.

luh-t-to avatar Aug 08 '25 14:08 luh-t-to

Sorry, looks like I mixed up import and import-url. netrc works for importing from a http(s) url or fetching from a http(s) remote.

That does not work for import when we clone from a Git repository. I think here git-credentials is a much better solution than using .netrc, which dvc already supports.

You can enable it using:

git config --global credential.helper store

After that is set, it will prompt you for password the first time, and dvc will save the credentials to the store. Subsequent runs will reuse the credentials provided.

See:

  1. https://stackoverflow.com/questions/35942754/how-can-i-save-username-and-password-in-git.
  2. https://git-scm.com/docs/gitcredentials

You can also use other credential managers, including your os keychain.

skshetry avatar Aug 08 '25 15:08 skshetry