gcsfs icon indicating copy to clipboard operation
gcsfs copied to clipboard

Testing existence of bucket incorrect

Open yan-hic opened this issue 6 years ago • 12 comments

GCSFileSystem().exists(<bucket>) does not work (anymore) on buckets in 'other' projects - it returns False even though service account has full access to the project (account is project owner in all).

However using <bucket>/<file> does work and returns True.

yan-hic avatar Dec 10 '19 06:12 yan-hic

Is this new in gcsfs 0.5.x?

TomAugspurger avatar Dec 10 '19 14:12 TomAugspurger

Sorry, yes, 0.5.3. Works as expected with 0.2.x FYI, we have not enabled Requester Pays so checking existence of a bucket should only depends on access rights

yan-hic avatar Dec 10 '19 14:12 yan-hic

And what about 0.4.x? I'm trying to figure out if it's related to requester pays (0.5.x) or some other change.

TomAugspurger avatar Dec 10 '19 14:12 TomAugspurger

Anything you can do to pin down where this stopped working would be helpful.

TomAugspurger avatar Dec 10 '19 14:12 TomAugspurger

exists for a directory (bucket or not) should tests whether ls on that path returns anything; for a bucket, I suppose an empty ls (but without an error) would also mean it exists. Perhaps the code is trying to call parent on the path, which would indeed amount to testing whether the bucket is one of those owned by the user.

martindurant avatar Dec 10 '19 14:12 martindurant

Doesn't work with 0.4.0 either, no error but returns False.

A quick debug seems to indicate that there is a check against the list of buckets in the account-project. This is incorrect as existence should check regardless as long as the service account has access to it.

Worth adding:

  • exists(bucket-other-project) returns False
  • ls(bucket-other-project) returns list of files (immediate children)

yan-hic avatar Dec 10 '19 15:12 yan-hic

Btw, 0.3.1 works: exists(bucket-other-project) returns True

yan-hic avatar Dec 10 '19 15:12 yan-hic

Sounds like exists should have a special-case override for buckets, since in all lower levels, you can tell if a bucket exists by looking at the parent listing.

martindurant avatar Dec 10 '19 15:12 martindurant

Last thing I tried, on 0.5.3,

with fs=gcsfs.GCSFileSystem(project=<another-project-where-account-is-owner>):

  • ls and exists on bucket in that project errors out: Requester pays bucket access requires authentication - not sure why since not enabled

with fs=gcsfs.GCSFileSystem(user_project=<another-project-where-account-is-owner>):

  • same result as https://github.com/dask/gcsfs/issues/222#issuecomment-564078005

yan-hic avatar Dec 10 '19 15:12 yan-hic

OK, thanks for testing.

This is incorrect as existence should check regardless as long as the service account has access to it.

That sounds right to me. I'll see if I can implement a different check.

TomAugspurger avatar Dec 10 '19 15:12 TomAugspurger

@TomAugspurger it seems like catching FileNotFoundError on ls(bucket) would do

yan-hic avatar Dec 10 '19 15:12 yan-hic

@yiga2 I haven't had a chance to work on this. Are you able to look into it?

And you might want to verify that things are still broken with 0.6.0. I suspect they are, but just in case.

TomAugspurger avatar Dec 20 '19 20:12 TomAugspurger