S3 request fails with 400 when requesting parquet file from a different region
I have a python script that requests parquet files from S3 and those files can be in different regions.
When I have the default region set to us-east-1 and the bucket is also in us-east-1 then the request works as expected.
When default region is set to us-east-1 and the bucket is in us-east-2 the request fails with Error: HTTP Error: HTTP GET error on <https url of parquet file> (HTTP 400).
AWS SDK's usually manage to catch the redirect that S3 issues and connect to the correct endpoint and download the file but duckdb does not and the error is also not helpful at all.
Ideally duckdb should handle the redirect automatically but if it decides not to then an actionable error message would be a good user experience.
Workaround: explicitly set the region.
Hi,
Thank you for the issue. Can you provide some code reproduction steps? It will make fixing your issue much easier. I have been querying parquet files in different regions consistently and have not seen anything like this.
Can you also tell us what version of DuckDB you are using?
Hi.
This is on 1.3.2
Default region us-east-1
INSTALL httpfs;
LOAD httpfs;
The following bucket is in us-east-2
summarize 's3://something-something-us-east-2/something.parquet'
HTTP Error: HTTP GET error on 'https://something-something-us-east-2.s3.amazonaws.com/something.parquet' (HTTP 400)
Bad Request - this can be caused by the S3 region being set incorrectly.
* Provided region is "us-east-1"
When setting SET s3_region='us-east-2'; the query succeeds without issues.
This might just be a convenience thing but I think the failed request headers should return a header with the correct region to request.
I ran into this with when running S3 inventory report queries in python for multiple regions and to deal with the correct regions manually was a bit annoying.
That said, I am grateful for the good work and don't think this is a major issue.
This may be because we hardcode the region in the endpoint now. We do this to follow amazon documentation. Previously we would follow the redirects if no region was given, but now that we include it in the url there are no more redirects.
https://docs.aws.amazon.com/general/latest/gr/s3.html
I will file an issue to add the option to allow for s3 redirects (although that is not recommended by amazon if I remember correctly)