universal_pathlib
universal_pathlib copied to clipboard
Provide `key`/`object_name`/`blob` attribute in CloudPath
Suppose I have a remote path as follows:
gcs_path = GCSPath("gs://bucket-name/A/B/C/filename.tar.gz")
I'd like a way to get a UPath object that represents the location in the bucket, e.g.
UPath("A/B/C/filename.tar.gz")
I'd like to avoid my present workaround of "/".join(gcs_path.parts[1:]) since it's not immediately clear what this code is doing.
Possibly related to https://github.com/fsspec/universal_pathlib/issues/170
Can you provide more context?
The following style for creating paths is implemented for s3, gcs and az object storage:
>>> import upath
>>> upath.UPath("A/B/C/filename.tar.gz", protocol="gs", bucket="bucket-name")
GCSPath('gs://bucket-name/A/B/C/filename.tar.gz')
I need the other way around.... I have a GCSPath object, now I'd like to extract the path itself (i.e. without the protocol and bucket) by using some of the methods/fields of the object. Is there a way to do that?
So from my understanding, to stay in the google storage vocabulary, you'd want the OBJECT_NAME, whereas right now you can only retrieve the PATH_TO_RESOURCE = BUCKET_NAME/OBJECT_NAME.
This will become generally available once either relative_to behaviour is fixed (which requires some more thought before rolling out) or url chaining is implemented #28 (you can then use dirfs to remove the prefix from path.)
In a future version .key could be made available with the implementation below:
>>> import upath
>>> x = upath.UPath("gs://bucket/abc/efg/file.txt")
>>> x.path.removeprefix(x.anchor)
'abc/efg/file.txt'
More general, relative path behavior can be made use of via (upath>=0.3.0):
>>> import upath
>>> x = upath.UPath("gs://bucket/abc/efg/file.txt")
>>> y = upath.UPath("gs://bucket/")
>>> z = x.relative_to(y)
>>> z
<relative GCSPath 'abc/efg/file.txt'>
>>> str(z)
'abc/efg/file.txt'