dvc
dvc copied to clipboard
get / import: No storage files available
Bug Report
Description
I have tracked files in repo-a
under data
. dvc import
and dvc get
both fail when trying to get files from repo-a
in repo-b
.
Reproduce
I cloned my own repo (repo-a
) under /tmp
to test whether dvc pull
works. It does. Then I checked status and remote:
[/tmp/repo-a] [master *]
-> % uv run dvc status -c
Cache and remote 'azure-blob' are in sync.
[/tmp/repo-a] [master *]
-> % uv run dvc list --dvc-only .
data
So that is all correct.
Then I go to my repo-b
. I configured the remote to be the same as the one of rebo-a
. Here is the check:
[repo-b] [master *]
-> % diff .dvc/config.local /tmp/repo-a/.dvc/config.local | wc -l
0
Then I try to get the data from repo-a
. It fails
[repo-b] [master *]
-> % uv run dvc list "[email protected]:<org>/repo-a.git" --dvc-only
data
[repo-b] [master *]
-> % uv run dvc get "[email protected]:<org>/repo-a.git" "data" -v
2024-09-30 13:41:19,905 DEBUG: v3.55.2 (pip), CPython 3.10.14 on Linux-6.8.0-45-generic-x86_64-with-glibc2.35
2024-09-30 13:41:19,906 DEBUG: command: /.../repo-b/.venv/bin/dvc get [email protected]:<org>/repo-a.git data -v
2024-09-30 13:41:19,985 DEBUG: Creating external repo [email protected]:<org>/repo-a.git@None
2024-09-30 13:41:19,985 DEBUG: erepo: git clone '[email protected]:<org>/repo-a.git' to a temporary dir
2024-09-30 13:41:42,394 DEBUG: failed to load ('data', 'cvat', 'datumaro-dataset') from storage local (/tmp/tmpsuoa_qcgdvc-cache/files/md5) - [Errno 2] No such file or directory: '/tmp/tmpsuoa_qcgdvc-cache/files/md5/8a/6de34918ed22935e97644bf465f920.dir'
Traceback (most recent call last):
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 611, in _load_from_storage
_load_from_object_storage(trie, entry, storage)
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 547, in _load_from_object_storage
obj = Tree.load(storage.odb, root_entry.hash_info, hash_name=storage.odb.hash_name)
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/hashfile/tree.py", line 193, in load
with obj.fs.open(obj.path, "r") as fobj:
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 324, in open
return self.fs.open(path, mode=mode, **kwargs)
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_objects/fs/local.py", line 131, in open
return open(path, mode=mode, encoding=encoding) # noqa: SIM115
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpsuoa_qcgdvc-cache/files/md5/8a/6de34918ed22935e97644bf465f920.dir'
2024-09-30 13:41:42,401 ERROR: unexpected error - failed to load directory ('data', 'cvat', 'datumaro-dataset'): [Errno 2] No such file or directory: '/tmp/tmpsuoa_qcgdvc-cache/files/md5/8a/6de34918ed22935e97644bf465f920.dir'
Traceback (most recent call last):
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 611, in _load_from_storage
_load_from_object_storage(trie, entry, storage)
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 547, in _load_from_object_storage
obj = Tree.load(storage.odb, root_entry.hash_info, hash_name=storage.odb.hash_name)
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/hashfile/tree.py", line 193, in load
with obj.fs.open(obj.path, "r") as fobj:
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 324, in open
return self.fs.open(path, mode=mode, **kwargs)
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_objects/fs/local.py", line 131, in open
return open(path, mode=mode, encoding=encoding) # noqa: SIM115
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpsuoa_qcgdvc-cache/files/md5/8a/6de34918ed22935e97644bf465f920.dir'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc/cli/__init__.py", line 211, in main
ret = cmd.do_run()
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc/cli/command.py", line 41, in do_run
return self.run()
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc/commands/get.py", line 30, in run
return self._get_file_from_repo()
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc/commands/get.py", line 37, in _get_file_from_repo
Repo.get(
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc/repo/get.py", line 64, in get
download(fs, fs_path, os.path.abspath(out), jobs=jobs)
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc/fs/__init__.py", line 67, in download
return fs._get(fs_path, to, batch_size=jobs, callback=cb)
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc/fs/dvc.py", line 692, in _get
return self.fs._get(
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc/fs/dvc.py", line 543, in _get
for root, dirs, files in self.walk(rpath, maxdepth=maxdepth, detail=True):
File "/.../repo-b/.venv/lib/python3.10/site-packages/fsspec/spec.py", line 468, in walk
yield from self.walk(
File "/.../repo-b/.venv/lib/python3.10/site-packages/fsspec/spec.py", line 468, in walk
yield from self.walk(
File "/.../repo-b/.venv/lib/python3.10/site-packages/fsspec/spec.py", line 427, in walk
listing = self.ls(path, detail=True, **kwargs)
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc/fs/dvc.py", line 382, in ls
for info in dvc_fs.ls(dvc_path, detail=True):
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 519, in ls
return self.fs.ls(path, detail=detail, **kwargs)
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/fs.py", line 164, in ls
for key, info in self.index.ls(root_key, detail=True):
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 764, in ls
self._ensure_loaded(root_key)
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 761, in _ensure_loaded
self._load(prefix, entry)
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 710, in _load
self.onerror(entry, exc)
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 638, in _onerror
raise exc
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 708, in _load
_load_from_storage(self._trie, entry, storage_info)
File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 626, in _load_from_storage
raise DataIndexDirError(f"failed to load directory {entry.key}") from last_exc
dvc_data.index.index.DataIndexDirError: failed to load directory ('data', 'cvat', 'datumaro-dataset')
2024-09-30 13:41:42,432 DEBUG: Version info for developers:
DVC version: 3.55.2 (pip)
-------------------------
Platform: Python 3.10.14 on Linux-6.8.0-45-generic-x86_64-with-glibc2.35
Subprojects:
dvc_data = 3.16.6
dvc_objects = 5.1.0
dvc_render = 1.0.2
dvc_task = 0.4.0
scmrepo = 3.3.8
Supports:
azure (adlfs = 2024.7.0, knack = 0.12.0, azure-identity = 1.18.0),
http (aiohttp = 3.10.8, aiohttp-retry = 2.8.3),
https (aiohttp = 3.10.8, aiohttp-retry = 2.8.3)
Config:
Global: /.../.config/dvc
System: /.../.config/kdedefaults/dvc
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: azure
Workspace directory: ext4 on /dev/mapper/vgkubuntu-root
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/bdf5f37be5108aada94933a567e64744
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2024-09-30 13:41:42,433 DEBUG: Analytics is enabled.
2024-09-30 13:41:42,458 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpwllf0ijo', '-v']
2024-09-30 13:41:42,465 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpwllf0ijo', '-v'] with pid 111408
2024-09-30 13:41:42,466 DEBUG: Removing '/tmp/tmp5s8bt4cedvc-clone'
2024-09-30 13:41:42,495 DEBUG: Removing '/tmp/tmpsuoa_qcgdvc-cache'
Then I tried if I can push from repo-b
. I can.
[repo-b] [master *]
-> % touch test
-> % uv run dvc push
Collecting
|1.00 [00:00, 234entry/s]
Pushing
1 file pushed
Same problem when I target a specific file:
[repo-b] [master *]
-> % uv run dvc get "[email protected]:<org>/repo-a.git" "data/master-table.csv"
ERROR: unexpected error - [Errno 2] No storage files available: 'data/master-table.csv'
But the file IS on the remote. I can pull it in the cloned repo-a
.
Also, see this:
-> % uv run dvc get [email protected]:<org>/repo-a.git data
ERROR: unexpected error - failed to load directory ('data', 'cvat', 'datumaro-dataset'): [Errno 2] No such file or directory: '/tmp/tmp_tgyr2ymdvc-cache/files/md5/8a/6de34918ed22935e97644bf465f920.dir'
This file (files/md5/8a/6de34918ed22935e97644bf465f920.dir
) DOES exist on the remote!
Environment information
-> % uv pip list G dvc
dvc 3.55.2
dvc-data 3.16.5
dvc-http 2.32.0
dvc-objects 5.1.0
dvc-render 1.0.2
dvc-studio-client 0.21.0
dvc-task 0.4.0
-> % uname -a
Linux <name> 6.8.0-45-generic #45~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Sep 11 15:25:05 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
-> % python --version
Python 3.10.13
Output of dvc doctor
:
DVC version: 3.55.2 (pip)
-------------------------
Platform: Python 3.10.14 on Linux-6.8.0-45-generic-x86_64-with-glibc2.35
Subprojects:
dvc_data = 3.16.6
dvc_objects = 5.1.0
dvc_render = 1.0.2
dvc_task = 0.4.0
scmrepo = 3.3.8
Supports:
azure (adlfs = 2024.7.0, knack = 0.12.0, azure-identity = 1.18.0),
http (aiohttp = 3.10.8, aiohttp-retry = 2.8.3),
https (aiohttp = 3.10.8, aiohttp-retry = 2.8.3)
Config:
Global: /home/mbs/.config/dvc
System: /home/mbs/.config/kdedefaults/dvc
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: azure
Workspace directory: ext4 on /dev/mapper/vgkubuntu-root
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/bdf5f37be5108aada94933a567e64744
I already deleted /var/tmp/dvc/
. Did not help.