universal_pathlib
universal_pathlib copied to clipboard
FileNotFoundError in write_bytes to Google Storage
Hey! I am trying to write a pickled object to google storage and I am getting FileNotFoundError. Here is a minimal script reproducing the problem:
from upath import UPath
import pickle
x = [1,2,3]
path = UPath("gs://some-bucket/x.pkl")
path.write_bytes(pickle.dumps(x))
Output:
Traceback (most recent call last): File "test.py", line 8, in
path.write_bytes(pickle.dumps(x)) File "/home/conda/environments/aes_pinned/lib/python3.8/pathlib.py", line 1246, in write_bytes return f.write(view) File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/fsspec/spec.py", line 1602, in exit self.close() File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/fsspec/spec.py", line 1569, in close self.flush(force=True) File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/fsspec/spec.py", line 1435, in flush self._initiate_upload() File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/gcsfs/core.py", line 1236, in _initiate_upload self.location = sync( File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/fsspec/asyn.py", line 69, in sync raise result[0] File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/fsspec/asyn.py", line 25, in _runner result[0] = await coro File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/gcsfs/core.py", line 1324, in initiate_upload headers, _ = await fs._call( File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/gcsfs/core.py", line 340, in _call status, headers, info, contents = await self._request( File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/decorator.py", line 221, in fun return await caller(func, *(extras + args), **kw) File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/gcsfs/retry.py", line 110, in retry_request return await func(*args, **kwargs) File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/gcsfs/core.py", line 332, in _request validate_response(status, contents, path) File "/home/conda/environments/aes_pinned/lib/python3.8/site-packages/gcsfs/retry.py", line 89, in validate_response raise FileNotFoundError
Interestingly DataFrame.to_csv is working for objects in that bucket so I don't think its a matter of permissions.
We ended up implementing some custom functions wrapping gcsfs since write_bytes, read_bytes, and exists are misbehaving:
import gcsfs
from upath import UPath
FS = gcsfs.GCSFileSystem()
def write_bytes(path: UPath, data: bytes):
if "gs://" in str(path):
with FS.open(path, "wb") as f:
f.write(data)
else:
path.write_bytes(data)
def read_bytes(path: UPath) -> bytes:
if "gs://" in str(path):
with FS.open(path, "rb") as f:
return f.read()
else:
return path.read_bytes()
def exists(path: UPath) -> bool:
if "gs://" in str(path):
return FS.exists(path)
else:
return path.exists()
Thanks for raising this issue! I'm working on creating a GCSPath implementation to handle some of these edge cases that the general UPath object doesn't handle quite right causing these errors. I hope to have that out shortly.
I am re-implementing these as part of the GCSPath subclass work. This was helpful! Thank you!
Just to be sure I tested again, and this should not be an issue anymore.
It's also covered in our BaseTests.
If it's still an issue, please reopen and provide some info about your environment!