Testing existence of bucket incorrect
GCSFileSystem().exists(<bucket>) does not work (anymore) on buckets in 'other' projects - it returns False even though service account has full access to the project (account is project owner in all).
However using <bucket>/<file> does work and returns True.
Is this new in gcsfs 0.5.x?
Sorry, yes, 0.5.3. Works as expected with 0.2.x FYI, we have not enabled Requester Pays so checking existence of a bucket should only depends on access rights
And what about 0.4.x? I'm trying to figure out if it's related to requester pays (0.5.x) or some other change.
Anything you can do to pin down where this stopped working would be helpful.
exists for a directory (bucket or not) should tests whether ls on that path returns anything; for a bucket, I suppose an empty ls (but without an error) would also mean it exists. Perhaps the code is trying to call parent on the path, which would indeed amount to testing whether the bucket is one of those owned by the user.
Doesn't work with 0.4.0 either, no error but returns False.
A quick debug seems to indicate that there is a check against the list of buckets in the account-project. This is incorrect as existence should check regardless as long as the service account has access to it.
Worth adding:
exists(bucket-other-project)returns Falsels(bucket-other-project) returns list of files (immediate children)
Btw, 0.3.1 works: exists(bucket-other-project) returns True
Sounds like exists should have a special-case override for buckets, since in all lower levels, you can tell if a bucket exists by looking at the parent listing.
Last thing I tried, on 0.5.3,
with fs=gcsfs.GCSFileSystem(project=<another-project-where-account-is-owner>):
lsandexistson bucket in that project errors out:Requester pays bucket access requires authentication- not sure why since not enabled
with fs=gcsfs.GCSFileSystem(user_project=<another-project-where-account-is-owner>):
- same result as https://github.com/dask/gcsfs/issues/222#issuecomment-564078005
OK, thanks for testing.
This is incorrect as existence should check regardless as long as the service account has access to it.
That sounds right to me. I'll see if I can implement a different check.
@TomAugspurger it seems like catching FileNotFoundError on ls(bucket) would do
@yiga2 I haven't had a chance to work on this. Are you able to look into it?
And you might want to verify that things are still broken with 0.6.0. I suspect they are, but just in case.