`path.iterdir()` yields `path` itself as the first item (with S3)
Given the following objects on S3:
s3://my-bucket/my-directory/0.txt
s3://my-bucket/my-directory/1.txt
UPath("s3://my-bucket/my-directory").iterdir() yields:
s3://my-bucket/my-directory
s3://my-bucket/my-directory/0.txt
s3://my-bucket/my-directory/1.txt
The first item is wrong, right?
Hi @danielgafni
Thank you for reporting! Could you check 3 things:
- did this change with recent fsspec versions? (Just try installing the newest vs installing one from <2024)
- if you list the contents of the bucket just using filesystem_spec (without upath) does it return an entry named my-bucket/my-directory ?
- was this s3 bucket manually modified using the AWS webui ? (i.e. Files uploaded via the webui?)
Cheers, Andreas 😊
Hey!
I can answer to (1) and (3) right away:
- My
fsspecversion was2024.3.1 - No, it wasn't
Regarding (2), I will be able to check a bit later
Checked it, seems to be an issue with fsspec. It has this problem too.
I wouldn't call it a problem. There's something stored under that key in the bucket. We should just define / document behavior of upath in cases like these.
No object with this key exists in the bucket. Also, it happens with any "directory" (common path prefix) in the bucket, not just a specific one.
Also, aws s3 ls doesn't contain the problematic key
Could you create a PR with a test case in the upath S3 tests that reproduces the issue? That would be super helpful for finding a solution
might be relevant: https://medium.com/cyberark-engineering/the-strange-case-of-amazon-s3-bucket-folders-c8d113a8dd01