gcsfs icon indicating copy to clipboard operation
gcsfs copied to clipboard

Recursive download seems broken

Open ngrislain opened this issue 5 years ago • 5 comments

When running fs.download(rpath, lpath, recursive=True) I now get the following error: IsADirectoryError: [Errno 21] Is a directory:

ngrislain avatar Apr 01 '20 09:04 ngrislain

Is lpath a directory, does it exist? Can you please find more details, perhaps make a test case.

martindurant avatar Apr 01 '20 12:04 martindurant

Hi Martin, so I have the same issue as Nicolas. Our code was working on gcsfs 0.6.0 and it fails on 0.6.1, with Python 3.6 (for me).

fs = GCSFileSystem(project=PROJECT)
fs.download(GCS_URI, f'/tmp/{dataset_name}/', recursive=True)

where GCS_URI is a directory on a GCS bucket and the lpath is indeed a directory on the local disk.

You can find where the error appears in the following trace:

  File "/home/xxx/.local/share/virtualenvs/private-learning-lab-iw_HFsYy/lib/python3.6/site-packages/fsspec/spec.py", line 977, in download
    return self.get(rpath, lpath, recursive=recursive, **kwargs)
  File "/home/xxx/.local/share/virtualenvs/private-learning-lab-iw_HFsYy/lib/python3.6/site-packages/fsspec/spec.py", line 610, in get
    with open(lpath, "wb") as f2:

So basically, ffspec tries to open a file which is actually a dir. I don't know why but it didn't happen with gcsfs 0.6.0.

vincentlepage avatar Apr 08 '20 17:04 vincentlepage

Does that "dir" correspond to a real, existent key on in the bucket? We could possibly add an exception in fsspec (or here, although the same is true for s3fs), that zero-length files should not be written, or explicitly prune out paths that look like they contain deeper nested things.

martindurant avatar Apr 08 '20 17:04 martindurant

Yes, the URI points to an existing key in the bucket. Our use case is very generic, and the issue should arise for any case with recursive=True. The weird thing is that it was perfectly fine on 0.6.0. I may spend some time tomorrow to understand which change produced that behavior.

vincentlepage avatar Apr 08 '20 18:04 vincentlepage

If there is a key that has the same path as a directory, then this problem should be expected - but of course it would be better if it was worked around. I suspect in the previous version, the key-with-the-name-of-a-directory simply wasn't being returned by gcsfs, which is also wrong.

martindurant avatar Apr 08 '20 18:04 martindurant