filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

Can't access data from S3 Buckets

Open omertechverx opened this issue 2 years ago • 14 comments

import fsspec
s3_fs = fsspec.filesystem("s3",  key="xxxxxx', secret="xxxxxxxxx",
                         client_kwargs={
                        "region_name": 'xxxxxxx',
                    })
s3_fs.ls('s3://')

It lists all the buckets but when i use

s3_fs.ls('s3://Bucket_name') It returns empty

Same bucket can be accessed with boto3 and all the contents of the bucket but using fsspec returns empty. Please help me solve this issue

If I try to read the file using fsspec.open it gives bad request error because file was not found

I have also tried s3fs it has the same issue

omertechverx avatar Feb 26 '23 10:02 omertechverx

What version of s3fs are you using? For boto3, are you also using key/secret, or something else?

cc https://github.com/fsspec/s3fs/issues/701

martindurant avatar Feb 26 '23 14:02 martindurant

cc https://github.com/fsspec/s3fs/issues/700

martindurant avatar Feb 26 '23 14:02 martindurant

Hi Martin, Thanks for the quick response I am using the verson : s3fs==2023.1.0

I solved it by specifying the bucket required in the endpoint endpoint_url: 'https://s3.amazonaws.com/'+S3_BUCKET_NAME

But If I don't Specify the endpoint of my bucket. It lists all the bucket but not the content of the bucket. Is that the usual behavior

omertechverx avatar Feb 26 '23 14:02 omertechverx

That is fascinating and also mysterious - definitely not how it should work.

@elephantum you were working with endpoint_url, does this ring any bells?

@Eugeny , maybe an interaction with transient bucket state?

@isidentical , long shot, but maybe related to bucket regions?

martindurant avatar Feb 26 '23 16:02 martindurant

This is strange, I did not encounter S3 endpoint_url in the form s3.amazonaws.com/{BUCKET_NAME}

Just a hunch: does it work with just https://s3.amazonaws.com as an endpoint? If yes, then probably in some aws-related config there's a misconfiguration

elephantum avatar Feb 26 '23 18:02 elephantum

This is strange, I did not encounter S3 endpoint_url in the form s3.amazonaws.com/{BUCKET_NAME}

Just a hunch: does it work with just https://s3.amazonaws.com as an endpoint? If yes, then probably in some aws-related config there's a misconfiguration

No It does not work

omertechverx avatar Feb 27 '23 07:02 omertechverx

import fsspec
bucket_name = 'test'
config = {
    'key': 'xxxxxxxxxxxxxxx',
    'secret': 'xxxxxxxxxxxxxxx',
    'client_kwargs' : {
    "endpoint_url":'https://s3.amazonaws.com/'+bucket_name,
    "region_name": 'region',
}
}
s3 = fsspec.filesystem('s3',  **config)
file_name = 'xyz'
with s3.open(f"{bucket_name}/{file_name}", "rb") as f:
     file_contents = f.read()
print(file_contents)

This is the complete code that i am using to read the file from the bucket and it works but if i change endpoint url it stops working. It would list out buckets but not the contents of the buckets. Same behavior if i remove client kwargs all together.

It Gives Bad Request Error when bucket name is not added in endpoint url which is due to i think file not found error. I have tried to access public and private both buckets without the bucket name in endpoint url and both can't be accessed.

omertechverx avatar Feb 27 '23 07:02 omertechverx

Would you mind listing the set of AWS_ environment variables you have defined (not their values, except where safe). So you have .boto or .aws files? Are you running this from within an AWS service?

martindurant avatar Feb 27 '23 13:02 martindurant

aws_access_key_id aws_secret_key_id bucket_name aws_region = us-east-1

I am running this from jupyter notebook.

omertechverx avatar Feb 27 '23 14:02 omertechverx

You have a variable called BUCKET_NAME?

I am running this from jupyter notebook.

But is that notebook running within AWS, perhaps on EC2 or other virtual machine?

martindurant avatar Feb 27 '23 14:02 martindurant

Member I am running jupyter on my local not inside aws. bucket name variable is for myself

omertechverx avatar Feb 27 '23 14:02 omertechverx

I think I have an idea.

https://s3.amazonaws.com/ is not correct endpoint for us-east-1.

Try providing: endpoint_url = https://s3.us-east-1.amazonaws.com

Endpoints reference: https://docs.aws.amazon.com/general/latest/gr/s3.html

elephantum avatar Feb 28 '23 07:02 elephantum

I think I have an idea.

https://s3.amazonaws.com/ is not correct endpoint for us-east-1.

Try providing: endpoint_url = https://s3.us-east-1.amazonaws.com

Endpoints reference: https://docs.aws.amazon.com/general/latest/gr/s3.html

Tried it does not work

omertechverx avatar Feb 28 '23 08:02 omertechverx

https://github.com/fsspec/s3fs/issues/701#issuecomment-1480303225 suggests that setting cache_regions=True for S3FileSystem or specifying the region of the bucket as your default region might be what you need. Can you try?

martindurant avatar Mar 28 '23 14:03 martindurant