dvc icon indicating copy to clipboard operation
dvc copied to clipboard

pull: fails on HDFS after removing `.dvc/cache`

Open zsaladin opened this issue 4 months ago • 2 comments

Bug Report

Description

dvc pullfails on HDFS after removing .dvc/cache. It means someone clones the repository at first then dvc pull always fails. But dvc pull -q succeed. So it seems that some log printing causes this problem.

I explain things that may help you to debug hopefully.

  1. Variable total is not a number. It causes the error.
  2. Variable **d contains variable total which is from size
  3. But in this case the variable size is not a number. It is a bound method. here

Reproduce

  1. dvc init
  2. Copy dataset.zip to the directory
  3. dvc remote add -d storage hdfs://user/dvc/mystorage
  4. dvc add dataset.zip
  5. dvc push
  6. rm -rf dataset.zip .dvc/.cache
  7. dvc pull

Expected

dvc pull and dvc fetch are executed successfully n HDFS.

Environment information

Output of dvc doctor:

$ dvc doctor

DVC version: 3.55.2 (pip)
-------------------------
Platform: Python 3.10.12 on Linux-6.10.4-linuxkit-x86_64-with-glibc2.28
Subprojects:
	dvc_data = 3.16.5
	dvc_objects = 5.1.0
	dvc_render = 1.0.2
	dvc_task = 0.4.0
	scmrepo = 3.3.7
Supports:
	azure (adlfs = 2024.7.0, knack = 0.12.0, azure-identity = 1.17.1),
	gdrive (pydrive2 = 1.20.0),
	gs (gcsfs = 2024.9.0.post1),
	hdfs (fsspec = 2024.9.0, pyarrow = 17.0.0),
	http (aiohttp = 3.10.5, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.10.5, aiohttp-retry = 2.8.3),
	oss (ossfs = 2023.12.0),
	s3 (s3fs = 2024.9.0, boto3 = 1.35.16),
	ssh (sshfs = 2024.6.0),
	webdav (webdav4 = 0.10.0),
	webdavs (webdav4 = 0.10.0),
	webhdfs (fsspec = 2024.9.0)
Config:
	Global: /home/user/.config/dvc
	System: /etc/xdg/dvc
Cache types: symlink
Cache directory: fuse.osxfs on osxfs
Caches: local
Remotes: hdfs
Workspace directory: fuse.osxfs on osxfs
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/19c955812b0a09cd409a3779f4e4d774

Additional Information (if any):

I attach error log below.

$ dvc pull -v

2024-10-08 15:51:47,388 DEBUG: v3.55.2 (pip), CPython 3.10.12 on Linux-6.10.4-linuxkit-x86_64-with-glibc2.28
2024-10-08 15:51:47,390 DEBUG: command: /home/user/.local/bin/dvc pull -v
Collecting                                                                                                                                                                                                                         |0.00 [00:00,    ?entry/s]
Fetching2024-10-08 15:51:49,343 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-10-08 15:51:50,297 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
2024-10-08 15:51:50,625 DEBUG: Preparing to transfer data from 'hdfs://user/dvc/mystorage/files/md5' to '/home/user/repo/.dvc/cache/files/md5'
2024-10-08 15:51:50,625 DEBUG: Preparing to collect status from '/home/user/repo/.dvc/cache/files/md5'
2024-10-08 15:51:50,625 DEBUG: Collecting status from '/home/user/repo/.dvc/cache/files/md5'
2024-10-08 15:51:50,629 DEBUG: Preparing to collect status from '/user/dvc/mystorage/files/md5'
2024-10-08 15:51:50,630 DEBUG: Collecting status from '/user/dvc/mystorage/files/md5'
2024-10-08 15:51:50,691 DEBUG: Estimated remote size: 256 files
2024-10-08 15:51:50,692 DEBUG: Querying 2 oids via traverse
Fetching
  0%|          |Fetching from hdfs                                                                                                                                                                                                 0/1 [00:00<?,     ?file/s]
2024-10-08 15:51:51,217 DEBUG: Removing '/home/user/repo/.dvc/cache/files/md5/12/.bnFqV3d0PmZTKtbQoPM-8A.tmp'
2024-10-08 15:51:51,219 ERROR: failed to transfer '126a8a51b9d1bbd07fddc65819a542c3' - unsupported operand type(s) for +: 'method' and 'float'
Traceback (most recent call last):
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 349, in transfer
    _try_links(
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 281, in _try_links
    return copy(
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 97, in copy
    return _get(
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 227, in _get
    _get_one(from_paths[0], to_paths[0])
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 217, in _get_one
    return from_fs.get_file(
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 645, in get_file
    self.fs.get_file(from_info, to_info, callback=callback, **kwargs)
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/fsspec/implementations/arrow.py", line 210, in get_file
    super().get_file(rpath, lpath, **kwargs)
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/fsspec/spec.py", line 904, in get_file
    callback.set_size(getattr(f1, "size", None))
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/fsspec/callbacks.py", line 97, in set_size
    self.call()
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/fsspec/callbacks.py", line 311, in call
    self.tqdm = self._tqdm_cls(total=self.size, **self._tqdm_kwargs)
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_data/callbacks.py", line 92, in __init__
    super().__init__(
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/tqdm/std.py", line 1098, in __init__
    self.refresh(lock_args=self.lock_args)
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/tqdm/std.py", line 1347, in refresh
    self.display()
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/tqdm/std.py", line 1495, in display
    self.sp(self.__str__() if msg is None else msg)
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/tqdm/std.py", line 1151, in __str__
    return self.format_meter(**self.format_dict)
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_data/callbacks.py", line 129, in format_dict
    meter = self.format_meter(  # type: ignore[call-arg]
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/tqdm/std.py", line 534, in format_meter
    if total and n >= (total + 0.5):  # allow float imprecision (#849)
TypeError: unsupported operand type(s) for +: 'method' and 'float'

Fetching                                                                                                                                                                                                                                                    Exception ignored in: <function tqdm.__del__ at 0x7ffffdaf53f0>                                                                                                                                                                     0/1 [00:00<?,     ?file/s]
Traceback (most recent call last):
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/tqdm/std.py", line 1148, in __del__
    self.close()
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_data/callbacks.py", line 115, in close
    self.postfix["info"] = ""
TypeError: 'NoneType' object does not support item assignment
2024-10-08 15:51:51,224 DEBUG: failed to protect '/home/user/repo/.dvc/cache/files/md5/12/6a8a51b9d1bbd07fddc65819a542c3' - [Errno 2] No such file or directory: '/home/user/repo/.dvc/cache/files/md5/12/6a8a51b9d1bbd07fddc65819a542c3'
Traceback (most recent call last):
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_data/hashfile/db/local.py", line 117, in protect
    os.chmod(path, self.CACHE_MODE)
FileNotFoundError: [Errno 2] No such file or directory: '/home/user/repo/.dvc/cache/files/md5/12/6a8a51b9d1bbd07fddc65819a542c3'

Fetching
2024-10-08 15:51:51,227 ERROR: failed to pull data from the cloud - 1 files failed to download
Traceback (most recent call last):
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc/commands/data_sync.py", line 35, in run
    stats = self.repo.pull(
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc/repo/__init__.py", line 58, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc/repo/pull.py", line 30, in pull
    processed_files_count = self.fetch(
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc/repo/__init__.py", line 58, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc/repo/fetch.py", line 200, in fetch
    raise DownloadError(failed_count)
dvc.exceptions.DownloadError: 1 files failed to download

2024-10-08 15:51:51,234 DEBUG: Analytics is disabled.

zsaladin avatar Oct 08 '24 07:10 zsaladin