dvc
dvc copied to clipboard
pull: fails on HDFS after removing `.dvc/cache`
Bug Report
Description
dvc pull
fails on HDFS after removing .dvc/cache
. It means someone clones the repository at first then dvc pull
always fails.
But dvc pull -q
succeed. So it seems that some log printing causes this problem.
I explain things that may help you to debug hopefully.
- Variable total is not a number. It causes the error.
- Variable **d contains variable
total
which is from size - But in this case the variable
size
is not a number. It is a bound method. here
Reproduce
- dvc init
- Copy dataset.zip to the directory
- dvc remote add -d storage hdfs://user/dvc/mystorage
- dvc add dataset.zip
- dvc push
- rm -rf dataset.zip .dvc/.cache
- dvc pull
Expected
dvc pull
and dvc fetch
are executed successfully n HDFS.
Environment information
Output of dvc doctor
:
$ dvc doctor
DVC version: 3.55.2 (pip)
-------------------------
Platform: Python 3.10.12 on Linux-6.10.4-linuxkit-x86_64-with-glibc2.28
Subprojects:
dvc_data = 3.16.5
dvc_objects = 5.1.0
dvc_render = 1.0.2
dvc_task = 0.4.0
scmrepo = 3.3.7
Supports:
azure (adlfs = 2024.7.0, knack = 0.12.0, azure-identity = 1.17.1),
gdrive (pydrive2 = 1.20.0),
gs (gcsfs = 2024.9.0.post1),
hdfs (fsspec = 2024.9.0, pyarrow = 17.0.0),
http (aiohttp = 3.10.5, aiohttp-retry = 2.8.3),
https (aiohttp = 3.10.5, aiohttp-retry = 2.8.3),
oss (ossfs = 2023.12.0),
s3 (s3fs = 2024.9.0, boto3 = 1.35.16),
ssh (sshfs = 2024.6.0),
webdav (webdav4 = 0.10.0),
webdavs (webdav4 = 0.10.0),
webhdfs (fsspec = 2024.9.0)
Config:
Global: /home/user/.config/dvc
System: /etc/xdg/dvc
Cache types: symlink
Cache directory: fuse.osxfs on osxfs
Caches: local
Remotes: hdfs
Workspace directory: fuse.osxfs on osxfs
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/19c955812b0a09cd409a3779f4e4d774
Additional Information (if any):
I attach error log below.
$ dvc pull -v
2024-10-08 15:51:47,388 DEBUG: v3.55.2 (pip), CPython 3.10.12 on Linux-6.10.4-linuxkit-x86_64-with-glibc2.28
2024-10-08 15:51:47,390 DEBUG: command: /home/user/.local/bin/dvc pull -v
Collecting |0.00 [00:00, ?entry/s]
Fetching2024-10-08 15:51:49,343 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-10-08 15:51:50,297 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
2024-10-08 15:51:50,625 DEBUG: Preparing to transfer data from 'hdfs://user/dvc/mystorage/files/md5' to '/home/user/repo/.dvc/cache/files/md5'
2024-10-08 15:51:50,625 DEBUG: Preparing to collect status from '/home/user/repo/.dvc/cache/files/md5'
2024-10-08 15:51:50,625 DEBUG: Collecting status from '/home/user/repo/.dvc/cache/files/md5'
2024-10-08 15:51:50,629 DEBUG: Preparing to collect status from '/user/dvc/mystorage/files/md5'
2024-10-08 15:51:50,630 DEBUG: Collecting status from '/user/dvc/mystorage/files/md5'
2024-10-08 15:51:50,691 DEBUG: Estimated remote size: 256 files
2024-10-08 15:51:50,692 DEBUG: Querying 2 oids via traverse
Fetching
0%| |Fetching from hdfs 0/1 [00:00<?, ?file/s]
2024-10-08 15:51:51,217 DEBUG: Removing '/home/user/repo/.dvc/cache/files/md5/12/.bnFqV3d0PmZTKtbQoPM-8A.tmp'
2024-10-08 15:51:51,219 ERROR: failed to transfer '126a8a51b9d1bbd07fddc65819a542c3' - unsupported operand type(s) for +: 'method' and 'float'
Traceback (most recent call last):
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 349, in transfer
_try_links(
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 281, in _try_links
return copy(
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 97, in copy
return _get(
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 227, in _get
_get_one(from_paths[0], to_paths[0])
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 217, in _get_one
return from_fs.get_file(
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 645, in get_file
self.fs.get_file(from_info, to_info, callback=callback, **kwargs)
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/fsspec/implementations/arrow.py", line 210, in get_file
super().get_file(rpath, lpath, **kwargs)
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/fsspec/spec.py", line 904, in get_file
callback.set_size(getattr(f1, "size", None))
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/fsspec/callbacks.py", line 97, in set_size
self.call()
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/fsspec/callbacks.py", line 311, in call
self.tqdm = self._tqdm_cls(total=self.size, **self._tqdm_kwargs)
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_data/callbacks.py", line 92, in __init__
super().__init__(
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/tqdm/std.py", line 1098, in __init__
self.refresh(lock_args=self.lock_args)
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/tqdm/std.py", line 1347, in refresh
self.display()
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/tqdm/std.py", line 1495, in display
self.sp(self.__str__() if msg is None else msg)
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/tqdm/std.py", line 1151, in __str__
return self.format_meter(**self.format_dict)
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_data/callbacks.py", line 129, in format_dict
meter = self.format_meter( # type: ignore[call-arg]
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/tqdm/std.py", line 534, in format_meter
if total and n >= (total + 0.5): # allow float imprecision (#849)
TypeError: unsupported operand type(s) for +: 'method' and 'float'
Fetching Exception ignored in: <function tqdm.__del__ at 0x7ffffdaf53f0> 0/1 [00:00<?, ?file/s]
Traceback (most recent call last):
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/tqdm/std.py", line 1148, in __del__
self.close()
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_data/callbacks.py", line 115, in close
self.postfix["info"] = ""
TypeError: 'NoneType' object does not support item assignment
2024-10-08 15:51:51,224 DEBUG: failed to protect '/home/user/repo/.dvc/cache/files/md5/12/6a8a51b9d1bbd07fddc65819a542c3' - [Errno 2] No such file or directory: '/home/user/repo/.dvc/cache/files/md5/12/6a8a51b9d1bbd07fddc65819a542c3'
Traceback (most recent call last):
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_data/hashfile/db/local.py", line 117, in protect
os.chmod(path, self.CACHE_MODE)
FileNotFoundError: [Errno 2] No such file or directory: '/home/user/repo/.dvc/cache/files/md5/12/6a8a51b9d1bbd07fddc65819a542c3'
Fetching
2024-10-08 15:51:51,227 ERROR: failed to pull data from the cloud - 1 files failed to download
Traceback (most recent call last):
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc/commands/data_sync.py", line 35, in run
stats = self.repo.pull(
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc/repo/__init__.py", line 58, in wrapper
return f(repo, *args, **kwargs)
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc/repo/pull.py", line 30, in pull
processed_files_count = self.fetch(
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc/repo/__init__.py", line 58, in wrapper
return f(repo, *args, **kwargs)
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc/repo/fetch.py", line 200, in fetch
raise DownloadError(failed_count)
dvc.exceptions.DownloadError: 1 files failed to download
2024-10-08 15:51:51,234 DEBUG: Analytics is disabled.