dvc icon indicating copy to clipboard operation
dvc copied to clipboard

get / import: No storage files available

Open mbspng opened this issue 4 months ago • 10 comments

Bug Report

Description

I have tracked files in repo-a under data. dvc import and dvc get both fail when trying to get files from repo-a in repo-b.

Reproduce

I cloned my own repo (repo-a) under /tmp to test whether dvc pull works. It does. Then I checked status and remote:

[/tmp/repo-a] [master *]
-> % uv run dvc status -c
Cache and remote 'azure-blob' are in sync.      

[/tmp/repo-a] [master *]
-> % uv run dvc list --dvc-only .
data

So that is all correct.

Then I go to my repo-b. I configured the remote to be the same as the one of rebo-a. Here is the check:

[repo-b] [master *]
-> % diff .dvc/config.local /tmp/repo-a/.dvc/config.local | wc -l
0

Then I try to get the data from repo-a. It fails

[repo-b] [master *]
-> % uv run dvc list "[email protected]:<org>/repo-a.git" --dvc-only
data      

[repo-b] [master *]
-> % uv run dvc get "[email protected]:<org>/repo-a.git" "data" -v
2024-09-30 13:41:19,905 DEBUG: v3.55.2 (pip), CPython 3.10.14 on Linux-6.8.0-45-generic-x86_64-with-glibc2.35
2024-09-30 13:41:19,906 DEBUG: command: /.../repo-b/.venv/bin/dvc get [email protected]:<org>/repo-a.git data -v
2024-09-30 13:41:19,985 DEBUG: Creating external repo [email protected]:<org>/repo-a.git@None
2024-09-30 13:41:19,985 DEBUG: erepo: git clone '[email protected]:<org>/repo-a.git' to a temporary dir
2024-09-30 13:41:42,394 DEBUG: failed to load ('data', 'cvat', 'datumaro-dataset') from storage local (/tmp/tmpsuoa_qcgdvc-cache/files/md5) - [Errno 2] No such file or directory: '/tmp/tmpsuoa_qcgdvc-cache/files/md5/8a/6de34918ed22935e97644bf465f920.dir'
Traceback (most recent call last):
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 611, in _load_from_storage
    _load_from_object_storage(trie, entry, storage)
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 547, in _load_from_object_storage
    obj = Tree.load(storage.odb, root_entry.hash_info, hash_name=storage.odb.hash_name)
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/hashfile/tree.py", line 193, in load
    with obj.fs.open(obj.path, "r") as fobj:
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 324, in open
    return self.fs.open(path, mode=mode, **kwargs)
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_objects/fs/local.py", line 131, in open
    return open(path, mode=mode, encoding=encoding)  # noqa: SIM115
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpsuoa_qcgdvc-cache/files/md5/8a/6de34918ed22935e97644bf465f920.dir'

2024-09-30 13:41:42,401 ERROR: unexpected error - failed to load directory ('data', 'cvat', 'datumaro-dataset'): [Errno 2] No such file or directory: '/tmp/tmpsuoa_qcgdvc-cache/files/md5/8a/6de34918ed22935e97644bf465f920.dir'                           
Traceback (most recent call last):
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 611, in _load_from_storage
    _load_from_object_storage(trie, entry, storage)
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 547, in _load_from_object_storage
    obj = Tree.load(storage.odb, root_entry.hash_info, hash_name=storage.odb.hash_name)
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/hashfile/tree.py", line 193, in load
    with obj.fs.open(obj.path, "r") as fobj:
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 324, in open
    return self.fs.open(path, mode=mode, **kwargs)
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_objects/fs/local.py", line 131, in open
    return open(path, mode=mode, encoding=encoding)  # noqa: SIM115
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpsuoa_qcgdvc-cache/files/md5/8a/6de34918ed22935e97644bf465f920.dir'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc/cli/__init__.py", line 211, in main
    ret = cmd.do_run()
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc/cli/command.py", line 41, in do_run
    return self.run()
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc/commands/get.py", line 30, in run
    return self._get_file_from_repo()
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc/commands/get.py", line 37, in _get_file_from_repo
    Repo.get(
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc/repo/get.py", line 64, in get
    download(fs, fs_path, os.path.abspath(out), jobs=jobs)
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc/fs/__init__.py", line 67, in download
    return fs._get(fs_path, to, batch_size=jobs, callback=cb)
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc/fs/dvc.py", line 692, in _get
    return self.fs._get(
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc/fs/dvc.py", line 543, in _get
    for root, dirs, files in self.walk(rpath, maxdepth=maxdepth, detail=True):
  File "/.../repo-b/.venv/lib/python3.10/site-packages/fsspec/spec.py", line 468, in walk
    yield from self.walk(
  File "/.../repo-b/.venv/lib/python3.10/site-packages/fsspec/spec.py", line 468, in walk
    yield from self.walk(
  File "/.../repo-b/.venv/lib/python3.10/site-packages/fsspec/spec.py", line 427, in walk
    listing = self.ls(path, detail=True, **kwargs)
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc/fs/dvc.py", line 382, in ls
    for info in dvc_fs.ls(dvc_path, detail=True):
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 519, in ls
    return self.fs.ls(path, detail=detail, **kwargs)
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/fs.py", line 164, in ls
    for key, info in self.index.ls(root_key, detail=True):
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 764, in ls
    self._ensure_loaded(root_key)
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 761, in _ensure_loaded
    self._load(prefix, entry)
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 710, in _load
    self.onerror(entry, exc)
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 638, in _onerror
    raise exc
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 708, in _load
    _load_from_storage(self._trie, entry, storage_info)
  File "/.../repo-b/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 626, in _load_from_storage
    raise DataIndexDirError(f"failed to load directory {entry.key}") from last_exc
dvc_data.index.index.DataIndexDirError: failed to load directory ('data', 'cvat', 'datumaro-dataset')

2024-09-30 13:41:42,432 DEBUG: Version info for developers:
DVC version: 3.55.2 (pip)
-------------------------
Platform: Python 3.10.14 on Linux-6.8.0-45-generic-x86_64-with-glibc2.35
Subprojects:
        dvc_data = 3.16.6
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.4.0
        scmrepo = 3.3.8
Supports:
        azure (adlfs = 2024.7.0, knack = 0.12.0, azure-identity = 1.18.0),
        http (aiohttp = 3.10.8, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.10.8, aiohttp-retry = 2.8.3)
Config:
        Global: /.../.config/dvc
        System: /.../.config/kdedefaults/dvc
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: azure
Workspace directory: ext4 on /dev/mapper/vgkubuntu-root
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/bdf5f37be5108aada94933a567e64744

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2024-09-30 13:41:42,433 DEBUG: Analytics is enabled.
2024-09-30 13:41:42,458 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpwllf0ijo', '-v']
2024-09-30 13:41:42,465 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpwllf0ijo', '-v'] with pid 111408
2024-09-30 13:41:42,466 DEBUG: Removing '/tmp/tmp5s8bt4cedvc-clone'
2024-09-30 13:41:42,495 DEBUG: Removing '/tmp/tmpsuoa_qcgdvc-cache'

Then I tried if I can push from repo-b. I can.

[repo-b] [master *]
-> % touch test

-> % uv run dvc push
Collecting
|1.00 [00:00,  234entry/s]
Pushing
1 file pushed

Same problem when I target a specific file:

[repo-b] [master *]

-> % uv run dvc get "[email protected]:<org>/repo-a.git" "data/master-table.csv"
ERROR: unexpected error - [Errno 2] No storage files available: 'data/master-table.csv' 

But the file IS on the remote. I can pull it in the cloned repo-a.

Also, see this:

-> % uv run dvc get [email protected]:<org>/repo-a.git data
ERROR: unexpected error - failed to load directory ('data', 'cvat', 'datumaro-dataset'): [Errno 2] No such file or directory: '/tmp/tmp_tgyr2ymdvc-cache/files/md5/8a/6de34918ed22935e97644bf465f920.dir'   

This file (files/md5/8a/6de34918ed22935e97644bf465f920.dir) DOES exist on the remote!

Environment information

-> % uv pip list G dvc
dvc                           3.55.2
dvc-data                      3.16.5
dvc-http                      2.32.0
dvc-objects                   5.1.0
dvc-render                    1.0.2
dvc-studio-client             0.21.0
dvc-task                      0.4.0

-> % uname  -a
Linux <name> 6.8.0-45-generic #45~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Sep 11 15:25:05 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

-> % python --version
Python 3.10.13

Output of dvc doctor:

DVC version: 3.55.2 (pip)
-------------------------
Platform: Python 3.10.14 on Linux-6.8.0-45-generic-x86_64-with-glibc2.35
Subprojects:
        dvc_data = 3.16.6
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.4.0
        scmrepo = 3.3.8
Supports:
        azure (adlfs = 2024.7.0, knack = 0.12.0, azure-identity = 1.18.0),
        http (aiohttp = 3.10.8, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.10.8, aiohttp-retry = 2.8.3)
Config:
        Global: /home/mbs/.config/dvc
        System: /home/mbs/.config/kdedefaults/dvc
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: azure
Workspace directory: ext4 on /dev/mapper/vgkubuntu-root
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/bdf5f37be5108aada94933a567e64744

I already deleted /var/tmp/dvc/. Did not help.

mbspng avatar Sep 30 '24 12:09 mbspng