fetch: fails when using url.insteadOf in git config
Bug Report
Description
I'm using a dvc imported asset in a project. In a certain environment, I'm using url.insteadOf to replace the URL of the repo from which the asset is imported. In my particular case, I'm replacing an SSH url wth a path URL. However, the clone of that remote repo fails right here:
https://github.com/iterative/dvc/blob/6ace5ed380b2f9e35ea0edc4e8ed5c48d576050b/dvc/scm.py#L160-L162
The first call to Git.clone() succeeds as the URL is properly replaced. However, in the call to fetch_all_exps, the value of url being provided is NOT the replaced one, which is stored in the cloned repo's config as the URL of the remote branch. And so the fetch fails. Also potentially relevant section is in scmrepo.git.backend.dulwich.iter_remote_refs():
https://github.com/iterative/scmrepo/blob/f70e0323746f22833581c26efbfcccb285ddb845/src/scmrepo/git/backend/dulwich/init.py#L492-L500
It's possible this behavior should be handled by the upstream packages (scmrepo or dulwich), but I'm starting the discussion here.
Reproduce
dvc importan asset from any project using SSH URL (or just make a dummy.dvcfile)- Clone that same repo to e.g.
/tmp/remote get config --global url./tmp/remote.insteadOf ${SSH_URL}SSH_AUTH_SOCK= dvc pull asset.dvc(SSH_AUTH_SOCKhere is just an example if you're using ssh-agent. The point is to do this in an env without creds for SSH access.)
Expected
The pull should succeed!
Environment information
Output of dvc doctor:
$ dvc doctor
DVC version: 2.58.2 (pip)
-------------------------
Platform: Python 3.8.16 on Linux-6.2.6-76060206-generic-x86_64-with-glibc2.2.5
Subprojects:
dvc_data = 0.51.0
dvc_objects = 0.22.0
dvc_render = 0.3.1
dvc_task = 0.2.1
scmrepo = 1.0.2
Supports:
http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
s3 (s3fs = 2023.3.0, boto3 = 1.24.59),
ssh (sshfs = 2023.4.1)
Config:
Global: /home/kernel/.config/dvc
System: /etc/xdg/dvc
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: s3, ssh
Workspace directory: overlay on overlay
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/a6c21da1aef04b4fdc4a48db8508fea3
Additional Information (if any): The stack trace is very long, but I think I've pointed out the relevant sections above.
Related, but another way to solve my problem would be if one could use remote://${remote_name} as the repo.url in .dvc files, as then the URL could be overridden by DVC configs at the very beginning. Open to anything that solves this problem, including opening a PR myself with the preferred approach :)
Thanks for the detailed report @sjawhar .
Slightly unrelated, but dvc import should not try to fetch_all_exps in this case. This is a ~similar problem to #9117 where commands failed when trying to get exp refs unnecessarily.