steampipe-plugin-aws icon indicating copy to clipboard operation
steampipe-plugin-aws copied to clipboard

feat: use HTTP HEAD method to retrieve S3 bucket region instead of GetBucketLocation

Open pdecat opened this issue 1 year ago • 12 comments

This PR replaces usage of GetBucketLocation REST API calls by plain HTTP HEAD requests to retrieve S3 bucket region.

Resolves #1586

~Note: this based on the branch of https://github.com/turbot/steampipe-plugin-aws/pull/2080 to allow running the tests/aws_s3_bucket/ test.~

pdecat avatar Feb 20 '24 15:02 pdecat

Test results:

# node tint.js tests/aws_s3_bucket/
No env file present for the current environment:  staging
 Falling back to .env config
No env file present for the current environment:  staging
customEnv TURBOT_TEST_EXPECTED_TIMEOUT undefined

SETUP: tests/aws_s3_bucket []

PRETEST: tests/aws_s3_bucket

TEST: tests/aws_s3_bucket
Running terraform
data.aws_region.alternate: Reading...
data.aws_region.alternate: Read complete after 0s [id=us-east-1]
data.aws_canonical_user_id.current_user: Reading...
data.aws_region.primary: Reading...
data.aws_caller_identity.current: Reading...
data.aws_partition.current: Reading...
data.aws_region.primary: Read complete after 0s [id=us-east-2]
data.aws_partition.current: Read complete after 0s [id=aws]
data.aws_caller_identity.current: Read complete after 0s [id=012345678901]
data.null_data_source.resource: Reading...
data.null_data_source.resource: Read complete after 0s [id=static]
data.aws_canonical_user_id.current_user: Read complete after 1s [id=012345678901012345678901012345678901]

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # aws_kms_key.mykey will be created
  + resource "aws_kms_key" "mykey" {
      + arn                                = (known after apply)
      + bypass_policy_lockout_safety_check = false
      + customer_master_key_spec           = "SYMMETRIC_DEFAULT"
      + deletion_window_in_days            = 10
      + description                        = "This key is used to encrypt bucket objects"
      + enable_key_rotation                = false
      + id                                 = (known after apply)
      + is_enabled                         = true
      + key_id                             = (known after apply)
      + key_usage                          = "ENCRYPT_DECRYPT"
      + multi_region                       = (known after apply)
      + policy                             = (known after apply)
      + tags_all                           = (known after apply)
    }

  # aws_s3_bucket.named_test_resource will be created
  + resource "aws_s3_bucket" "named_test_resource" {
      + acceleration_status         = (known after apply)
      + acl                         = (known after apply)
      + arn                         = (known after apply)
      + bucket                      = "turbottest46000"
      + bucket_domain_name          = (known after apply)
      + bucket_prefix               = (known after apply)
      + bucket_regional_domain_name = (known after apply)
      + force_destroy               = false
      + hosted_zone_id              = (known after apply)
      + id                          = (known after apply)
      + object_lock_enabled         = (known after apply)
      + policy                      = (known after apply)
      + region                      = (known after apply)
      + request_payer               = (known after apply)
      + tags                        = {
          + "name" = "turbottest46000"
        }
      + tags_all                    = {
          + "name" = "turbottest46000"
        }
      + website_domain              = (known after apply)
      + website_endpoint            = (known after apply)

      + cors_rule {
          + allowed_headers = [
              + "*",
            ]
          + allowed_methods = [
              + "PUT",
              + "POST",
            ]
          + allowed_origins = [
              + "https://s3-website-test.hashicorp.com",
            ]
          + expose_headers  = [
              + "ETag",
            ]
          + max_age_seconds = 3000
        }

      + lifecycle_rule {
          + enabled = true
          + id      = "log"
          + prefix  = "log/"
          + tags    = {
              + "autoclean" = "true"
              + "rule"      = "log"
            }

          + expiration {
              + days = 90
            }

          + transition {
              + days          = 30
              + storage_class = "STANDARD_IA"
            }
          + transition {
              + days          = 60
              + storage_class = "GLACIER"
            }
        }
      + lifecycle_rule {
          + enabled = true
          + id      = "tmp"
          + prefix  = "tmp/"

          + expiration {
              + date = "2022-01-12"
            }
        }

      + object_lock_configuration {
          + object_lock_enabled = "Enabled"
        }

      + versioning {
          + enabled    = true
          + mfa_delete = false
        }
    }

  # aws_s3_bucket_acl.named_test_resource will be created
  + resource "aws_s3_bucket_acl" "named_test_resource" {
      + bucket = (known after apply)
      + id     = (known after apply)

      + access_control_policy {
          + grant {
              + permission = "FULL_CONTROL"

              + grantee {
                  + display_name = (known after apply)
                  + id           = "012345678901012345678901012345678901"
                  + type         = "CanonicalUser"
                }
            }
          + owner {
              + display_name = (known after apply)
              + id           = "012345678901012345678901012345678901"
            }
        }
    }

  # aws_s3_bucket_ownership_controls.named_test_resource will be created
  + resource "aws_s3_bucket_ownership_controls" "named_test_resource" {
      + bucket = (known after apply)
      + id     = (known after apply)

      + rule {
          + object_ownership = "BucketOwnerPreferred"
        }
    }

  # aws_s3_bucket_policy.b will be created
  + resource "aws_s3_bucket_policy" "b" {
      + bucket = (known after apply)
      + id     = (known after apply)
      + policy = (known after apply)
    }

Plan: 5 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + account_id        = "012345678901"
  + aws_partition     = "aws"
  + canonical_user_id = "012345678901012345678901012345678901"
  + kms_key_id        = (known after apply)
  + resource_aka      = (known after apply)
  + resource_name     = "turbottest46000"
aws_kms_key.mykey: Creating...
aws_s3_bucket.named_test_resource: Creating...
aws_kms_key.mykey: Creation complete after 1s [id=0aa7048d-a3ac-4a97-afe8-b552b1af2b18]
aws_s3_bucket.named_test_resource: Creation complete after 4s [id=turbottest46000]
aws_s3_bucket_ownership_controls.named_test_resource: Creating...
aws_s3_bucket_policy.b: Creating...
aws_s3_bucket_ownership_controls.named_test_resource: Creation complete after 0s [id=turbottest46000]
aws_s3_bucket_acl.named_test_resource: Creating...
aws_s3_bucket_policy.b: Creation complete after 1s [id=turbottest46000]
aws_s3_bucket_acl.named_test_resource: Creation complete after 1s [id=turbottest46000]

Warning: Deprecated

  with data.null_data_source.resource,
  on variables.tf line 48, in data "null_data_source" "resource":
  48: data "null_data_source" "resource" {

The null_data_source was historically used to construct intermediate values
to re-use elsewhere in configuration, the same can now be achieved using
locals or the terraform_data resource type in Terraform 1.4 and later.

(and one more similar warning elsewhere)

Warning: Argument is deprecated

  with aws_s3_bucket.named_test_resource,
  on variables.tf line 59, in resource "aws_s3_bucket" "named_test_resource":
  59: resource "aws_s3_bucket" "named_test_resource" {

Use the aws_s3_bucket_lifecycle_configuration resource instead

(and 14 more similar warnings elsewhere)

Apply complete! Resources: 5 added, 0 changed, 0 destroyed.

Outputs:

account_id = "012345678901"
aws_partition = "aws"
canonical_user_id = "012345678901012345678901012345678901"
kms_key_id = "arn:aws:kms:us-east-2:012345678901:key/0aa7048d-a3ac-4a97-afe8-b552b1af2b18"
resource_aka = "arn:aws:s3:::turbottest46000"
resource_name = "turbottest46000"

Running SQL query: test-get-query.sql
[
  {
    "akas": [
      "arn:aws:s3:::turbottest46000"
    ]
  }
]
✔ PASSED

Running SQL query: test-hydrate-query.sql
[
  {
    "acl": {
      "Grants": [
        {
          "Grantee": {
            "DisplayName": null,
            "EmailAddress": null,
            "ID": "012345678901012345678901012345678901",
            "Type": "CanonicalUser",
            "URI": null
          },
          "Permission": "FULL_CONTROL"
        }
      ],
      "Owner": {
        "DisplayName": null,
        "ID": "012345678901012345678901012345678901"
      }
    },
    "bucket_policy_is_public": false,
    "logging": null,
    "name": "turbottest46000",
    "object_lock_configuration": {
      "ObjectLockEnabled": "Enabled",
      "Rule": null
    },
    "region": "us-east-2",
    "replication": null,
    "versioning_enabled": true,
    "versioning_mfa_delete": false
  }
]
✔ PASSED

Running SQL query: test-list-query.sql
[
  {
    "akas": [
      "arn:aws:s3:::turbottest46000"
    ],
    "bucket_policy_is_public": false,
    "logging": null,
    "name": "turbottest46000",
    "partition": "aws",
    "tags": {
      "name": "turbottest46000"
    },
    "tags_src": [
      {
        "Key": "name",
        "Value": "turbottest46000"
      }
    ],
    "title": "turbottest46000",
    "versioning_enabled": true,
    "versioning_mfa_delete": false
  }
]
✔ PASSED

POSTTEST: tests/aws_s3_bucket

TEARDOWN: tests/aws_s3_bucket

SUMMARY:

1/1 passed.

pdecat avatar Feb 20 '24 15:02 pdecat

Hey @pdecat , thanks for raising this PR! We have it on our radar to review, along with https://github.com/turbot/steampipe-plugin-aws/pull/2080.

cbruno10 avatar Feb 26 '24 19:02 cbruno10

Rebased on main following merge of https://github.com/turbot/steampipe-plugin-aws/pull/2080

pdecat avatar Mar 27 '24 06:03 pdecat

@pdecat Cross-posting here for better visibility - Did you notice any performance improvements, or other benefits? We had originally opened that issue because AWS API docs said to use HeadBucket instead, but it wasn’t clear if there were any benefits other than moving off of an older API method.

If we need to manually handle what the GetBucketLocation API call does already, like compose URLs for different partitions, I’m wondering if it's beneficial to switch over.

cbruno10 avatar Apr 11 '24 19:04 cbruno10

Hi @cbruno10,

@pdecat Cross-posting here for better visibility - Did you notice any performance improvements, or other benefits? We had originally opened that issue because AWS API docs said to use HeadBucket instead, but it wasn’t clear if there were any benefits other than moving off of an older API method.

If we need to manually handle what the GetBucketLocation API call does already, like compose URLs for different partitions, I’m wondering if it's beneficial to switch over.

I believe I've mentioned this somewhere, probably in Slack, but the main benefit is to avoid the errors that occur if the GetBucketLocation API is invoked on a region endpoint that doesn't match the bucket location. The SDK doesn't follow the non standard header that is returned by the API in this case (not Location).

I'll address all the other comments ASAP.

pdecat avatar Apr 11 '24 20:04 pdecat

@pdecat Is there a specific query or set of queries where the plugin would call GetBucketLocation on a region other than the bucket's region? I don't know if I've hit this before, but I may not have been running the specific set to hit this use case.

cbruno10 avatar Apr 12 '24 01:04 cbruno10

Found my comment investigating the errors with GetBucketLocation that happen when hitting the wrong region with the AWS Go SDK that the Steampipe AWS plugin is using: https://github.com/turbot/steampipe-plugin-aws/issues/1586#issuecomment-1954154456

With Steampipe, here's the error message that is logged when AWS_DEFAULT_REGION is set to a wrong region:

rpc error: code = Unknown desc = operation error S3: GetBucketLocation, https response error StatusCode: 403, RequestID: MV4QH9V0Z3YMRZEV, HostID: CLdJF******K4L198mlFM8SWneQ6VE=, api error AccessDenied: Access Denied

In Slack, I've found these comments (copying it as it will soon be inaccessible):

Typically, I can reproduce the issue with the AWS CLI:

# AWS_DEFAULT_REGION=us-east-1 aws --profile my-profile s3api get-bucket-location --bucket my-bucket

An error occurred (AccessDenied) when calling the GetBucketLocation operation: Access Denied

# AWS_DEFAULT_REGION=eu-west-3 aws --profile my-profile s3api get-bucket-location --bucket my-bucket
{
    "LocationConstraint": "eu-west-3"
}

# AWS_DEFAULT_REGION=us-east-1 aws --profile my-profile s3api head-bucket --bucket my-bucket
{
    "BucketRegion": "eu-west-3",
    "AccessPointAlias": false
}

https://turbot-community.slack.com/archives/C044P668806/p1708359173460459?thread_ts=1708357736.103139&cid=C044P668806

I've tried to update the S3 tests to reproduce the issue, but it happens the GetBucketLocation access denied error only reproduces under specific conditions, e.g. if the requester is not the owner of the bucket:

GetBucketLocation requires the requester to be the owner of the bucket.

https://github.com/aws/aws-sdk-go/issues/720#issuecomment-229208282

The S3 team got back with me and suggested the best API to use is HEAD bucket. GetBucketLocation uses a more complex permissions model where HeadBucket can be called by anyone.

https://github.com/aws/aws-sdk-go/issues/720#issuecomment-243891223

https://turbot-community.slack.com/archives/C044P668806/p1708367391732849?thread_ts=1708357736.103139&cid=C044P668806

pdecat avatar Apr 12 '24 06:04 pdecat

Here's a related issue mentioning the error I'm facing https://github.com/turbot/steampipe-plugin-aws/issues/1713

pdecat avatar Apr 12 '24 06:04 pdecat

FYI, instead of using http.Head you can also use the manager.GetBucketRegion function in AWS SDK for Go v2.

vpartington avatar Apr 12 '24 09:04 vpartington

Hi @vpartington,

FYI, instead of using http.Head you can also use the manager.GetBucketRegion function in AWS SDK for Go v2.

Interesting, this helper function does seem to do an unauthenticated HTTP HEAD request too (which is somewhat confusing because there's an actual HeadBucket API and it states All HeadBucket requests must be authenticated and signed by using IAM credentials).

image

And it probably handles other uncommon cases as described in https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/s3#HeadBucketInput.

pdecat avatar Apr 12 '24 10:04 pdecat

Anyway, I've not faced any issue yet since I've switched to using HTTP HEAD two months ago.

And it works even if the bucket is not yours:

# curl -HEAD -i https://s3.amazonaws.com/commoncrawl
HTTP/1.1 403 Forbidden
x-amz-bucket-region: us-east-1
x-amz-request-id: J0AD186X1H9EK40X
x-amz-id-2: OrTMgtyo12JoQRDHJAAEXmBWCb78wpt6XMit/Hv3lw7+LlAKr0ywEL1DtKGHlq733ojIHtJ0kSs=
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Fri, 12 Apr 2024 10:17:30 GMT
Server: AmazonS3

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>J0AD186X1H9EK40X</RequestId><HostId>OrTMgtyo12JoQRDHJAAEXmBWCb78wpt6XMit/Hv3lw7+LlAKr0ywEL1DtKGHlq733ojIHtJ0kSs=</HostId></Error>

Note the x-amz-bucket-region: us-east-1 header.

pdecat avatar Apr 12 '24 10:04 pdecat

The HeadBucket docs are indeed very confusing. I spent the last few days banging my head against the wall about this. Yesterday I settled on doing a http.Head, just like you do.

Just as I was writing a comment for why I do http.Head in our code base this morning, I found that manager.GetBucketRegion method. It does seem to handle some exotic use cases that I don't care about.

For me having less code to test or maintain is the reason I threw away my code. But then I have not had my code running successfully for a while, so I would also stick with what I've got if I were you.

Just thought I'd reach out because I saw you had struggled with the same problem in this MR.

Take care!

vpartington avatar Apr 12 '24 10:04 vpartington

@pdecat Sorry for the long response time (again)! I've left a few follow-up questions/suggestions, can you please have a look when you get a chance? Thanks!

cbruno10 avatar May 23 '24 19:05 cbruno10

Hi @cbruno10, I've implemented the requested changes. PTAL :)

pdecat avatar May 24 '24 12:05 pdecat

Can confirm everything also works fine with the use of manager.GetBucketRegion() instead of plain HTTP HEAD requests.

pdecat avatar May 27 '24 14:05 pdecat

Thanks so much @pdecat for this PR, and also to @vpartington for sharing info on manager.GetBucketRegion()!

cbruno10 avatar May 28 '24 19:05 cbruno10