steampipe-plugin-aws icon indicating copy to clipboard operation
steampipe-plugin-aws copied to clipboard

aws_s3_bucket should use HeadBucket instead of GetBucketLocation

Open e-gineer opened this issue 3 years ago • 13 comments

From https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketLocation.html:

For requests made using AWS Signature Version 4 (SigV4), we recommend that you use HeadBucket to return the bucket Region instead of GetBucketLocation.

https://github.com/turbot/steampipe-plugin-aws/blob/6418b0bf93cd67ad753cca67168fd0fe1a4ff082/aws/table_aws_s3_bucket.go#L338-L342

e-gineer avatar Feb 14 '23 15:02 e-gineer

'This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.'

github-actions[bot] avatar Apr 16 '23 23:04 github-actions[bot]

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] avatar Jun 16 '23 23:06 github-actions[bot]

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] avatar Aug 18 '23 23:08 github-actions[bot]

Took a quick look at using HeadBucket operation - however I kept receiving a 301 error.

operation error S3: HeadBucket, https response error StatusCode: 301, RequestID: 6ZC4YNZX6Z7MDFR8, HostID: eI5x2FvRBbAQreiEqD2lpJlZvJh00lOWHPtcVRw9Q24esThCm8UxrzoPorEo5eIk8HK+ce0D5yU=, api error MovedPermanently: Moved Permanently

graza-io avatar Sep 25 '23 14:09 graza-io

Can't you see what the location header was set to in the 301 response?

Edit: too bad AWS do not return the standard Location header here.

pdecat avatar Sep 25 '23 15:09 pdecat

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] avatar Nov 24 '23 23:11 github-actions[bot]

Using HeadBucket instead makes sense, because GetBucketLocation can return a special string EU. See it specified in the API docs of this method: image

There is no such region as "EU", so steampipe AWS plugin right now has this crutch: https://github.com/turbot/steampipe-plugin-aws/blob/b71914ee85a7816fc5cc6eaf65be61f344a0a21d/aws/table_aws_s3_bucket.go#L422-L429

I'm not sure how correct this is. I couldn't find any docs that actually say that EU == eu-west-1, but it looks like it practically is?

Veetaha avatar Dec 21 '23 02:12 Veetaha

I'm not sure how correct this is. I couldn't find any docs that actually say that EU == eu-west-1, but it looks like it practically is?

It sort of does https://github.com/EverythingMe/ncdu-s3/pull/4 🙂

pdecat avatar Dec 21 '23 02:12 pdecat

FYI - work done here may also help with #1713

e-gineer avatar Feb 19 '24 16:02 e-gineer

Hi, FYI I'm currently working on implementing this, and investigating the best options.

Took a quick look at using HeadBucket operation - however I kept receiving a 301 error.

Indeed, the 301 happens when invoking HeadBucket on an endpoint region that does not match the bucket region, the that endpoint does not include the standard HTTP Location header.

When using the aws s3api head-bucket command, the AWS CLI extracts the bucket region from the x-amz-bucket-region header (documented here) and retries the request on that region:

AWS_DEFAULT_REGION=eu-west-1 aws --debug s3api head-bucket --bucket commoncrawl |& grep -E "urllib3.connectionpool|x-amz-bucket"
2024-02-20 13:24:16,989 - MainThread - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): commoncrawl.s3.eu-west-1.amazonaws.com:443
2024-02-20 13:24:17,116 - MainThread - urllib3.connectionpool - DEBUG - https://commoncrawl.s3.eu-west-1.amazonaws.com:443 "HEAD / HTTP/1.1" 301 0
2024-02-20 13:24:17,116 - MainThread - botocore.parsers - DEBUG - Response headers: {'x-amz-bucket-region': 'us-east-1', 'x-amz-request-id': '4AD4FCR5BJ7N91FD', 'x-amz-id-2': 'wmDu9nX5k6L818sp+cEYM31xBFKlhrotqRnbhwqhCqR2YWqRb3ghB/IbWeI4cwBamCt16/+0Y1U=', 'Content-Type': 'application/xml', 'Date': 'Tue, 20 Feb 2024 12:24:16 GMT', 'Server': 'AmazonS3'}
2024-02-20 13:24:17,121 - MainThread - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): commoncrawl.s3.us-east-1.amazonaws.com:443
2024-02-20 13:24:17,556 - MainThread - urllib3.connectionpool - DEBUG - https://commoncrawl.s3.us-east-1.amazonaws.com:443 "HEAD / HTTP/1.1" 200 0
2024-02-20 13:24:17,557 - MainThread - botocore.parsers - DEBUG - Response headers: {'x-amz-id-2': 'T8ZAgIx9faNGXQPOE+XMdhWt4ZWa6tSNZyCETNBbfHVJhBUK1buLQSh0+8+49xkolCdct52VT74=', 'x-amz-request-id': '4AD5FT0YS5TV2HER', 'Date': 'Tue, 20 Feb 2024 12:24:18 GMT', 'x-amz-bucket-region': 'us-east-1', 'x-amz-access-point-alias': 'false', 'Content-Type': 'application/xml', 'Server': 'AmazonS3'}

With the AWS SDK for Go v2, it is currently not possible to retry that call as the SDK does not expose the header in case of 301 HTTP responses. And adding that feature has been rejected by the development team.

As we only want the region information, there's no need to retry that call, making simple unauthenticated requests is enough to get the desired header:

# curl -HEAD -i https://commoncrawl.s3.amazonaws.com
HTTP/1.1 403 Forbidden
x-amz-bucket-region: us-east-1
x-amz-request-id: ****
x-amz-id-2: ****
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Tue, 20 Feb 2024 12:33:07 GMT
Server: AmazonS3

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>****</RequestId><HostId>****</HostId></Error>[

Even with path style usage:

# curl -HEAD -i https://s3.amazonaws.com/commoncrawl
HTTP/1.1 403 Forbidden
x-amz-bucket-region: us-east-1
x-amz-request-id: ****
x-amz-id-2: ****
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Tue, 20 Feb 2024 12:46:36 GMT
Server: AmazonS3

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>****</RequestId><HostId>****</HostId></Error>

See https://github.com/aws/aws-sdk-go/issues/356#issuecomment-132707340 and https://github.com/aws/aws-sdk-go/issues/720#issuecomment-243891223

Right now, I believe the way to go is to perform such plain HTTP HEAD requests to get the header value.

pdecat avatar Feb 20 '24 12:02 pdecat

making simple unauthenticated requests is enough to get the desired header

That is surprising... Also, I'm not sure how much quota AWS provides for the rate of unauthenticated requests. It's probably lower? And the question is wether the response still contains the bucket region header if a RateExceeded or Throttling error is returned

Veetaha avatar Feb 20 '24 13:02 Veetaha

There's caching in the steampipe query data connection manager to avoid requesting the region to often for a single bucket. I've not faced throttling on that HEAD method yet, will perform more load testing ASAP to see how it behaves.

FWIW, I'm almost done before submitting my PR.

pdecat avatar Feb 20 '24 13:02 pdecat

Submitted https://github.com/turbot/steampipe-plugin-aws/pull/2082, PTAL :pray:

pdecat avatar Feb 20 '24 15:02 pdecat