dvc icon indicating copy to clipboard operation
dvc copied to clipboard

push:Google bucket with hyphen in name results in "ERROR: unexpected error - b/my-repository/o"

Open cgarbin opened this issue 3 years ago • 2 comments

Bug Report

Description

Followed the instructions to use a GCP remote on this iterative.ai blog post.

When I got to the dvc push step, it failed with ERROR: unexpected error - b/my-repository/o. The last few lines of the traceback:

  File "/..masked path.../.venv/lib/python3.10/site-packages/gcsfs/retry.py", line 84, in validate_response
    raise FileNotFoundError(path)
FileNotFoundError: b/my-repository/o

See end of this report for the full traceback.

Reproduce

Follow the steps on the GCP remote blog post, naming the GCP bucket with a hyphen, e.g. my-repository.

When running the dvc push part of the guide, it will show the error above.

Expected

Expected the local objects to be uploaded to the GCP bucket.

After I tried with a bucket that doesn't have a hyphen, e.g. myrepository, the dvc push step worked:

> dvc push -v
2022-09-14 16:28:47,718 DEBUG: Preparing to transfer data from '...' to 'myrepository'
...
2022-09-14 16:28:50,281 DEBUG: Querying '775' oids via traverse
...
775 files pushed

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.24.0 (pip)
---------------------------------
Platform: Python 3.10.2 on macOS-12.4-x86_64-i386-64bit
Subprojects:
	dvc_data = 0.4.0
	dvc_objects = 0.2.0
	dvc_render = 0.0.9
	dvc_task = 0.1.2
	dvclive = 0.10.0
	scmrepo = 0.0.25
Supports:
	gs (gcsfs = 2022.8.2),
	http (aiohttp = 3.8.1, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.8.1, aiohttp-retry = 2.8.3)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: gs
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git

Additional Information (if any):

Full error traceback:

> dvc push -v
2022-09-14 16:11:25,599 DEBUG: Preparing to transfer data from '/..masked path.../.dvc/cache' to 'my-repository'
2022-09-14 16:11:25,599 DEBUG: Preparing to collect status from 'my-repository'
2022-09-14 16:11:25,600 DEBUG: Collecting status from 'my-repository'
2022-09-14 16:11:25,600 DEBUG: Querying 1 oids via object_exists
2022-09-14 16:11:28,427 ERROR: unexpected error - b/my-repository/o
------------------------------------------------------------
Traceback (most recent call last):
  File "/..masked path.../.venv/lib/python3.10/site-packages/dvc/cli/__init__.py", line 185, in main
    ret = cmd.do_run()
  File "/..masked path.../.venv/lib/python3.10/site-packages/dvc/cli/command.py", line 22, in do_run
    return self.run()
  File "/..masked path.../.venv/lib/python3.10/site-packages/dvc/commands/data_sync.py", line 59, in run
    processed_files_count = self.repo.push(
  File "/..masked path.../.venv/lib/python3.10/site-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/..masked path.../.venv/lib/python3.10/site-packages/dvc/repo/push.py", line 68, in push
    pushed += self.cloud.push(
  File "/..masked path.../.venv/lib/python3.10/site-packages/dvc/data_cloud.py", line 109, in push
    return self.transfer(
  File "/..masked path.../.venv/lib/python3.10/site-packages/dvc/data_cloud.py", line 88, in transfer
    return transfer(src_odb, dest_odb, objs, **kwargs)
  File "/..masked path.../.venv/lib/python3.10/site-packages/dvc_data/transfer.py", line 158, in transfer
    status = compare_status(
  File "/..masked path.../.venv/lib/python3.10/site-packages/dvc_data/status.py", line 179, in compare_status
    dest_exists, dest_missing = status(
  File "/..masked path.../.venv/lib/python3.10/site-packages/dvc_data/status.py", line 151, in status
    odb.oids_exist(hashes, jobs=jobs, progress=pbar.callback)
  File "/..masked path.../.venv/lib/python3.10/site-packages/dvc_objects/db.py", line 337, in oids_exist
    remote_size, remote_oids = self._estimate_remote_size(
  File "/..masked path.../.venv/lib/python3.10/site-packages/dvc_objects/db.py", line 214, in _estimate_remote_size
    remote_oids = set(iter_with_pbar(oids))
  File "/..masked path.../.venv/lib/python3.10/site-packages/dvc_objects/db.py", line 204, in iter_with_pbar
    for oid in oids:
  File "/..masked path.../.venv/lib/python3.10/site-packages/dvc_objects/db.py", line 170, in _oids_with_limit
    for oid in self._list_oids(prefix):
  File "/..masked path.../.venv/lib/python3.10/site-packages/dvc_objects/db.py", line 160, in _list_oids
    for path in self._list_paths(prefix):
  File "/..masked path.../.venv/lib/python3.10/site-packages/dvc_objects/db.py", line 144, in _list_paths
    yield from self.fs.find(self.fs.path.join(*parts), prefix=bool(prefix))
  File "/..masked path.../.venv/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 535, in find
    files = self.fs.find(with_prefix, prefix=self.path.parts(path)[-1])
  File "/..masked path.../.venv/lib/python3.10/site-packages/fsspec/asyn.py", line 111, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/..masked path.../.venv/lib/python3.10/site-packages/fsspec/asyn.py", line 96, in sync
    raise return_result
  File "/..masked path.../.venv/lib/python3.10/site-packages/fsspec/asyn.py", line 53, in _runner
    result[0] = await coro
  File "/..masked path.../.venv/lib/python3.10/site-packages/dvc_gs/gcsfs.py", line 24, in _find
    objects, _ = await self._do_list_objects(
  File "/..masked path.../.venv/lib/python3.10/site-packages/gcsfs/core.py", line 521, in _do_list_objects
    page = await self._call(
  File "/..masked path.../.venv/lib/python3.10/site-packages/gcsfs/core.py", line 392, in _call
    status, headers, info, contents = await self._request(
  File "/..masked path.../.venv/lib/python3.10/site-packages/decorator.py", line 221, in fun
    return await caller(func, *(extras + args), **kw)
  File "/..masked path.../.venv/lib/python3.10/site-packages/gcsfs/retry.py", line 115, in retry_request
    return await func(*args, **kwargs)
  File "/..masked path.../.venv/lib/python3.10/site-packages/gcsfs/core.py", line 384, in _request
    validate_response(status, contents, path, args)
  File "/..masked path.../.venv/lib/python3.10/site-packages/gcsfs/retry.py", line 84, in validate_response
    raise FileNotFoundError(path)
FileNotFoundError: b/my-repository/o

cgarbin avatar Sep 14 '22 22:09 cgarbin

Hi. I think this is related to the fact that the remote is set to the bucket's root. If you're following the guide, could you modify the remote url so that it points to gs://updatedbikedata/cache?

dvc remote modify bikes gs://updatedbikedata/cache (or just delete/re-add it: dvc remote add -d bikes gs://updatedbikedata/cache)

dtrifiro avatar Sep 15 '22 12:09 dtrifiro

If you're following the guide, could you modify the remote url so that it points to gs://updatedbikedata/cache?

Thanks for quick reply. Note from the description that:

  1. I'm following this blog post. In the section to create the remote, it doesn't require the cache. The command there is $ dvc remote add -d bikes gs://updatedbikedata.
  2. It worked as soon as I changed from my-repository to myrepositoy (removed the hyphen).

To be more precise:

  • This fails: dvc remote add -d my-repository gs://my-repository
  • This works: dvc remote add -d myrepository gs://myrepository

cgarbin avatar Sep 15 '22 12:09 cgarbin

looks like a gcsfs limitation, not much we can do on dvc side. Closing for now

efiop avatar Jan 01 '23 21:01 efiop

Hello @efiop : can you please point to a reference that documents the gcsfs limitation? Asking because I can access buckets with hyphens in other applications.

cgarbin avatar Jan 02 '23 13:01 cgarbin

@cgarbin I don't have anything specific to point to, unfortunately. Just saying that this issue doesn't seem to be in dvc specifically, but rather in an underlying standalone library. One would need to research further (e.g. try gcsfs out), but we don't have the capacity to do it ourselves right now 🙁 If you would be willing to research yourself, we will be happy to help.

efiop avatar Jan 02 '23 14:01 efiop

@cgarbin does this keep happening with recent dvc versions?

I was able to reproduce back in September, but it seems the issue has been fixed in the meantime.

Tested with

dvc = 2.38.1
dvc_data = 0.28.4
dvc_objects = 0.14.0
gcsfs = 2022.11

dtrifiro avatar Jan 02 '23 14:01 dtrifiro