duckdb_aws icon indicating copy to clipboard operation
duckdb_aws copied to clipboard

S3 request fails with 400 when requesting parquet file from a different region

Open jnsaff opened this issue 11 months ago • 3 comments

I have a python script that requests parquet files from S3 and those files can be in different regions.

When I have the default region set to us-east-1 and the bucket is also in us-east-1 then the request works as expected.

When default region is set to us-east-1 and the bucket is in us-east-2 the request fails with Error: HTTP Error: HTTP GET error on <https url of parquet file> (HTTP 400).

AWS SDK's usually manage to catch the redirect that S3 issues and connect to the correct endpoint and download the file but duckdb does not and the error is also not helpful at all.

Ideally duckdb should handle the redirect automatically but if it decides not to then an actionable error message would be a good user experience.

Workaround: explicitly set the region.

jnsaff avatar Feb 12 '25 14:02 jnsaff

Hi,

Thank you for the issue. Can you provide some code reproduction steps? It will make fixing your issue much easier. I have been querying parquet files in different regions consistently and have not seen anything like this.

Can you also tell us what version of DuckDB you are using?

Tmonster avatar Aug 08 '25 12:08 Tmonster

Hi.

This is on 1.3.2

Default region us-east-1

INSTALL httpfs;
LOAD httpfs;

The following bucket is in us-east-2

summarize 's3://something-something-us-east-2/something.parquet'

HTTP Error: HTTP GET error on 'https://something-something-us-east-2.s3.amazonaws.com/something.parquet' (HTTP 400)

Bad Request - this can be caused by the S3 region being set incorrectly.
* Provided region is "us-east-1"

When setting SET s3_region='us-east-2'; the query succeeds without issues.

This might just be a convenience thing but I think the failed request headers should return a header with the correct region to request.

I ran into this with when running S3 inventory report queries in python for multiple regions and to deal with the correct regions manually was a bit annoying.

That said, I am grateful for the good work and don't think this is a major issue.

jnsaff avatar Sep 04 '25 13:09 jnsaff

This may be because we hardcode the region in the endpoint now. We do this to follow amazon documentation. Previously we would follow the redirects if no region was given, but now that we include it in the url there are no more redirects.

https://docs.aws.amazon.com/general/latest/gr/s3.html

I will file an issue to add the option to allow for s3 redirects (although that is not recommended by amazon if I remember correctly)

Tmonster avatar Sep 04 '25 16:09 Tmonster