steampipe-plugin-aws
steampipe-plugin-aws copied to clipboard
feat: use HTTP HEAD method to retrieve S3 bucket region instead of GetBucketLocation
This PR replaces usage of GetBucketLocation
REST API calls by plain HTTP HEAD
requests to retrieve S3 bucket region.
Resolves #1586
~Note: this based on the branch of https://github.com/turbot/steampipe-plugin-aws/pull/2080 to allow running the tests/aws_s3_bucket/
test.~
Test results:
# node tint.js tests/aws_s3_bucket/
No env file present for the current environment: staging
Falling back to .env config
No env file present for the current environment: staging
customEnv TURBOT_TEST_EXPECTED_TIMEOUT undefined
SETUP: tests/aws_s3_bucket []
PRETEST: tests/aws_s3_bucket
TEST: tests/aws_s3_bucket
Running terraform
data.aws_region.alternate: Reading...
data.aws_region.alternate: Read complete after 0s [id=us-east-1]
data.aws_canonical_user_id.current_user: Reading...
data.aws_region.primary: Reading...
data.aws_caller_identity.current: Reading...
data.aws_partition.current: Reading...
data.aws_region.primary: Read complete after 0s [id=us-east-2]
data.aws_partition.current: Read complete after 0s [id=aws]
data.aws_caller_identity.current: Read complete after 0s [id=012345678901]
data.null_data_source.resource: Reading...
data.null_data_source.resource: Read complete after 0s [id=static]
data.aws_canonical_user_id.current_user: Read complete after 1s [id=012345678901012345678901012345678901]
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# aws_kms_key.mykey will be created
+ resource "aws_kms_key" "mykey" {
+ arn = (known after apply)
+ bypass_policy_lockout_safety_check = false
+ customer_master_key_spec = "SYMMETRIC_DEFAULT"
+ deletion_window_in_days = 10
+ description = "This key is used to encrypt bucket objects"
+ enable_key_rotation = false
+ id = (known after apply)
+ is_enabled = true
+ key_id = (known after apply)
+ key_usage = "ENCRYPT_DECRYPT"
+ multi_region = (known after apply)
+ policy = (known after apply)
+ tags_all = (known after apply)
}
# aws_s3_bucket.named_test_resource will be created
+ resource "aws_s3_bucket" "named_test_resource" {
+ acceleration_status = (known after apply)
+ acl = (known after apply)
+ arn = (known after apply)
+ bucket = "turbottest46000"
+ bucket_domain_name = (known after apply)
+ bucket_prefix = (known after apply)
+ bucket_regional_domain_name = (known after apply)
+ force_destroy = false
+ hosted_zone_id = (known after apply)
+ id = (known after apply)
+ object_lock_enabled = (known after apply)
+ policy = (known after apply)
+ region = (known after apply)
+ request_payer = (known after apply)
+ tags = {
+ "name" = "turbottest46000"
}
+ tags_all = {
+ "name" = "turbottest46000"
}
+ website_domain = (known after apply)
+ website_endpoint = (known after apply)
+ cors_rule {
+ allowed_headers = [
+ "*",
]
+ allowed_methods = [
+ "PUT",
+ "POST",
]
+ allowed_origins = [
+ "https://s3-website-test.hashicorp.com",
]
+ expose_headers = [
+ "ETag",
]
+ max_age_seconds = 3000
}
+ lifecycle_rule {
+ enabled = true
+ id = "log"
+ prefix = "log/"
+ tags = {
+ "autoclean" = "true"
+ "rule" = "log"
}
+ expiration {
+ days = 90
}
+ transition {
+ days = 30
+ storage_class = "STANDARD_IA"
}
+ transition {
+ days = 60
+ storage_class = "GLACIER"
}
}
+ lifecycle_rule {
+ enabled = true
+ id = "tmp"
+ prefix = "tmp/"
+ expiration {
+ date = "2022-01-12"
}
}
+ object_lock_configuration {
+ object_lock_enabled = "Enabled"
}
+ versioning {
+ enabled = true
+ mfa_delete = false
}
}
# aws_s3_bucket_acl.named_test_resource will be created
+ resource "aws_s3_bucket_acl" "named_test_resource" {
+ bucket = (known after apply)
+ id = (known after apply)
+ access_control_policy {
+ grant {
+ permission = "FULL_CONTROL"
+ grantee {
+ display_name = (known after apply)
+ id = "012345678901012345678901012345678901"
+ type = "CanonicalUser"
}
}
+ owner {
+ display_name = (known after apply)
+ id = "012345678901012345678901012345678901"
}
}
}
# aws_s3_bucket_ownership_controls.named_test_resource will be created
+ resource "aws_s3_bucket_ownership_controls" "named_test_resource" {
+ bucket = (known after apply)
+ id = (known after apply)
+ rule {
+ object_ownership = "BucketOwnerPreferred"
}
}
# aws_s3_bucket_policy.b will be created
+ resource "aws_s3_bucket_policy" "b" {
+ bucket = (known after apply)
+ id = (known after apply)
+ policy = (known after apply)
}
Plan: 5 to add, 0 to change, 0 to destroy.
Changes to Outputs:
+ account_id = "012345678901"
+ aws_partition = "aws"
+ canonical_user_id = "012345678901012345678901012345678901"
+ kms_key_id = (known after apply)
+ resource_aka = (known after apply)
+ resource_name = "turbottest46000"
aws_kms_key.mykey: Creating...
aws_s3_bucket.named_test_resource: Creating...
aws_kms_key.mykey: Creation complete after 1s [id=0aa7048d-a3ac-4a97-afe8-b552b1af2b18]
aws_s3_bucket.named_test_resource: Creation complete after 4s [id=turbottest46000]
aws_s3_bucket_ownership_controls.named_test_resource: Creating...
aws_s3_bucket_policy.b: Creating...
aws_s3_bucket_ownership_controls.named_test_resource: Creation complete after 0s [id=turbottest46000]
aws_s3_bucket_acl.named_test_resource: Creating...
aws_s3_bucket_policy.b: Creation complete after 1s [id=turbottest46000]
aws_s3_bucket_acl.named_test_resource: Creation complete after 1s [id=turbottest46000]
Warning: Deprecated
with data.null_data_source.resource,
on variables.tf line 48, in data "null_data_source" "resource":
48: data "null_data_source" "resource" {
The null_data_source was historically used to construct intermediate values
to re-use elsewhere in configuration, the same can now be achieved using
locals or the terraform_data resource type in Terraform 1.4 and later.
(and one more similar warning elsewhere)
Warning: Argument is deprecated
with aws_s3_bucket.named_test_resource,
on variables.tf line 59, in resource "aws_s3_bucket" "named_test_resource":
59: resource "aws_s3_bucket" "named_test_resource" {
Use the aws_s3_bucket_lifecycle_configuration resource instead
(and 14 more similar warnings elsewhere)
Apply complete! Resources: 5 added, 0 changed, 0 destroyed.
Outputs:
account_id = "012345678901"
aws_partition = "aws"
canonical_user_id = "012345678901012345678901012345678901"
kms_key_id = "arn:aws:kms:us-east-2:012345678901:key/0aa7048d-a3ac-4a97-afe8-b552b1af2b18"
resource_aka = "arn:aws:s3:::turbottest46000"
resource_name = "turbottest46000"
Running SQL query: test-get-query.sql
[
{
"akas": [
"arn:aws:s3:::turbottest46000"
]
}
]
✔ PASSED
Running SQL query: test-hydrate-query.sql
[
{
"acl": {
"Grants": [
{
"Grantee": {
"DisplayName": null,
"EmailAddress": null,
"ID": "012345678901012345678901012345678901",
"Type": "CanonicalUser",
"URI": null
},
"Permission": "FULL_CONTROL"
}
],
"Owner": {
"DisplayName": null,
"ID": "012345678901012345678901012345678901"
}
},
"bucket_policy_is_public": false,
"logging": null,
"name": "turbottest46000",
"object_lock_configuration": {
"ObjectLockEnabled": "Enabled",
"Rule": null
},
"region": "us-east-2",
"replication": null,
"versioning_enabled": true,
"versioning_mfa_delete": false
}
]
✔ PASSED
Running SQL query: test-list-query.sql
[
{
"akas": [
"arn:aws:s3:::turbottest46000"
],
"bucket_policy_is_public": false,
"logging": null,
"name": "turbottest46000",
"partition": "aws",
"tags": {
"name": "turbottest46000"
},
"tags_src": [
{
"Key": "name",
"Value": "turbottest46000"
}
],
"title": "turbottest46000",
"versioning_enabled": true,
"versioning_mfa_delete": false
}
]
✔ PASSED
POSTTEST: tests/aws_s3_bucket
TEARDOWN: tests/aws_s3_bucket
SUMMARY:
1/1 passed.
Hey @pdecat , thanks for raising this PR! We have it on our radar to review, along with https://github.com/turbot/steampipe-plugin-aws/pull/2080.
Rebased on main following merge of https://github.com/turbot/steampipe-plugin-aws/pull/2080
@pdecat Cross-posting here for better visibility - Did you notice any performance improvements, or other benefits?
We had originally opened that issue because AWS API docs said to use HeadBucket
instead, but it wasn’t clear if there were any benefits other than moving off of an older API method.
If we need to manually handle what the GetBucketLocation
API call does already, like compose URLs for different partitions, I’m wondering if it's beneficial to switch over.
Hi @cbruno10,
@pdecat Cross-posting here for better visibility - Did you notice any performance improvements, or other benefits? We had originally opened that issue because AWS API docs said to use
HeadBucket
instead, but it wasn’t clear if there were any benefits other than moving off of an older API method.If we need to manually handle what the
GetBucketLocation
API call does already, like compose URLs for different partitions, I’m wondering if it's beneficial to switch over.
I believe I've mentioned this somewhere, probably in Slack, but the main benefit is to avoid the errors that occur if the GetBucketLocation
API is invoked on a region endpoint that doesn't match the bucket location. The SDK doesn't follow the non standard header that is returned by the API in this case (not Location
).
I'll address all the other comments ASAP.
@pdecat Is there a specific query or set of queries where the plugin would call GetBucketLocation
on a region other than the bucket's region? I don't know if I've hit this before, but I may not have been running the specific set to hit this use case.
Found my comment investigating the errors with GetBucketLocation
that happen when hitting the wrong region with the AWS Go SDK that the Steampipe AWS plugin is using:
https://github.com/turbot/steampipe-plugin-aws/issues/1586#issuecomment-1954154456
With Steampipe, here's the error message that is logged when AWS_DEFAULT_REGION
is set to a wrong region:
rpc error: code = Unknown desc = operation error S3: GetBucketLocation, https response error StatusCode: 403, RequestID: MV4QH9V0Z3YMRZEV, HostID: CLdJF******K4L198mlFM8SWneQ6VE=, api error AccessDenied: Access Denied
In Slack, I've found these comments (copying it as it will soon be inaccessible):
Typically, I can reproduce the issue with the AWS CLI:
# AWS_DEFAULT_REGION=us-east-1 aws --profile my-profile s3api get-bucket-location --bucket my-bucket
An error occurred (AccessDenied) when calling the GetBucketLocation operation: Access Denied
# AWS_DEFAULT_REGION=eu-west-3 aws --profile my-profile s3api get-bucket-location --bucket my-bucket
{
"LocationConstraint": "eu-west-3"
}
# AWS_DEFAULT_REGION=us-east-1 aws --profile my-profile s3api head-bucket --bucket my-bucket
{
"BucketRegion": "eu-west-3",
"AccessPointAlias": false
}
https://turbot-community.slack.com/archives/C044P668806/p1708359173460459?thread_ts=1708357736.103139&cid=C044P668806
I've tried to update the S3 tests to reproduce the issue, but it happens the GetBucketLocation access denied error only reproduces under specific conditions, e.g. if the requester is not the owner of the bucket:
GetBucketLocation requires the requester to be the owner of the bucket.
https://github.com/aws/aws-sdk-go/issues/720#issuecomment-229208282
The S3 team got back with me and suggested the best API to use is HEAD bucket. GetBucketLocation uses a more complex permissions model where HeadBucket can be called by anyone.
https://github.com/aws/aws-sdk-go/issues/720#issuecomment-243891223
https://turbot-community.slack.com/archives/C044P668806/p1708367391732849?thread_ts=1708357736.103139&cid=C044P668806
Here's a related issue mentioning the error I'm facing https://github.com/turbot/steampipe-plugin-aws/issues/1713
FYI, instead of using http.Head
you can also use the manager.GetBucketRegion
function in AWS SDK for Go v2.
Hi @vpartington,
FYI, instead of using
http.Head
you can also use themanager.GetBucketRegion
function in AWS SDK for Go v2.
Interesting, this helper function does seem to do an unauthenticated HTTP HEAD request too (which is somewhat confusing because there's an actual HeadBucket API and it states All HeadBucket requests must be authenticated and signed by using IAM credentials).
And it probably handles other uncommon cases as described in https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/s3#HeadBucketInput.
Anyway, I've not faced any issue yet since I've switched to using HTTP HEAD two months ago.
And it works even if the bucket is not yours:
# curl -HEAD -i https://s3.amazonaws.com/commoncrawl
HTTP/1.1 403 Forbidden
x-amz-bucket-region: us-east-1
x-amz-request-id: J0AD186X1H9EK40X
x-amz-id-2: OrTMgtyo12JoQRDHJAAEXmBWCb78wpt6XMit/Hv3lw7+LlAKr0ywEL1DtKGHlq733ojIHtJ0kSs=
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Fri, 12 Apr 2024 10:17:30 GMT
Server: AmazonS3
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>J0AD186X1H9EK40X</RequestId><HostId>OrTMgtyo12JoQRDHJAAEXmBWCb78wpt6XMit/Hv3lw7+LlAKr0ywEL1DtKGHlq733ojIHtJ0kSs=</HostId></Error>
Note the x-amz-bucket-region: us-east-1
header.
The HeadBucket
docs are indeed very confusing. I spent the last few days banging my head against the wall about this. Yesterday I settled on doing a http.Head
, just like you do.
Just as I was writing a comment for why I do http.Head
in our code base this morning, I found that manager.GetBucketRegion
method. It does seem to handle some exotic use cases that I don't care about.
For me having less code to test or maintain is the reason I threw away my code. But then I have not had my code running successfully for a while, so I would also stick with what I've got if I were you.
Just thought I'd reach out because I saw you had struggled with the same problem in this MR.
Take care!
@pdecat Sorry for the long response time (again)! I've left a few follow-up questions/suggestions, can you please have a look when you get a chance? Thanks!
Hi @cbruno10, I've implemented the requested changes. PTAL :)
Can confirm everything also works fine with the use of manager.GetBucketRegion()
instead of plain HTTP HEAD requests.
Thanks so much @pdecat for this PR, and also to @vpartington for sharing info on manager.GetBucketRegion()
!