universal_pathlib icon indicating copy to clipboard operation
universal_pathlib copied to clipboard

FileNotFoundError in write_bytes to Google Storage

Open cgarciae opened this issue 4 years ago • 3 comments
trafficstars

Hey! I am trying to write a pickled object to google storage and I am getting FileNotFoundError. Here is a minimal script reproducing the problem:

from upath import UPath
import pickle

x = [1,2,3]
path = UPath("gs://some-bucket/x.pkl")

path.write_bytes(pickle.dumps(x))

Output:

Traceback (most recent call last): File "test.py", line 8, in path.write_bytes(pickle.dumps(x)) File "/home/conda/environments/aes_pinned/lib/python3.8/pathlib.py", line 1246, in write_bytes return f.write(view) File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/fsspec/spec.py", line 1602, in exit self.close() File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/fsspec/spec.py", line 1569, in close self.flush(force=True) File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/fsspec/spec.py", line 1435, in flush self._initiate_upload() File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/gcsfs/core.py", line 1236, in _initiate_upload self.location = sync( File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/fsspec/asyn.py", line 69, in sync raise result[0] File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/fsspec/asyn.py", line 25, in _runner result[0] = await coro File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/gcsfs/core.py", line 1324, in initiate_upload headers, _ = await fs._call( File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/gcsfs/core.py", line 340, in _call status, headers, info, contents = await self._request( File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/decorator.py", line 221, in fun return await caller(func, *(extras + args), **kw) File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/gcsfs/retry.py", line 110, in retry_request return await func(*args, **kwargs) File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/gcsfs/core.py", line 332, in _request validate_response(status, contents, path) File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/gcsfs/retry.py", line 89, in validate_response raise FileNotFoundError

Interestingly DataFrame.to_csv is working for objects in that bucket so I don't think its a matter of permissions.

cgarciae avatar Jul 26 '21 21:07 cgarciae

We ended up implementing some custom functions wrapping gcsfs since write_bytes, read_bytes, and exists are misbehaving:

import gcsfs
from upath import UPath

FS = gcsfs.GCSFileSystem()

def write_bytes(path: UPath, data: bytes):
    if "gs://" in str(path):
        with FS.open(path, "wb") as f:
            f.write(data)
    else:
        path.write_bytes(data)


def read_bytes(path: UPath) -> bytes:
    if "gs://" in str(path):
        with FS.open(path, "rb") as f:
            return f.read()
    else:
        return path.read_bytes()


def exists(path: UPath) -> bool:
    if "gs://" in str(path):
        return FS.exists(path)
    else:
        return path.exists()

cgarciae avatar Jul 27 '21 18:07 cgarciae

Thanks for raising this issue! I'm working on creating a GCSPath implementation to handle some of these edge cases that the general UPath object doesn't handle quite right causing these errors. I hope to have that out shortly.

andrewfulton9 avatar Jul 28 '21 14:07 andrewfulton9

I am re-implementing these as part of the GCSPath subclass work. This was helpful! Thank you!

kcpevey avatar Aug 04 '21 17:08 kcpevey

Just to be sure I tested again, and this should not be an issue anymore. It's also covered in our BaseTests.

If it's still an issue, please reopen and provide some info about your environment!

ap-- avatar Aug 30 '23 18:08 ap--