beats icon indicating copy to clipboard operation
beats copied to clipboard

[8.14] Fix handling of custom Endpoint when using S3 + SQS

Open strawgate opened this issue 1 year ago • 2 comments

Proposed commit message

Fix issues described in https://github.com/elastic/beats/issues/39706 that prevent using a custom endpoint with S3 + SQS.

Users can workaround this issue via S3 bucket polling. The S3 bucket polling still works just fine with a custom endpoint, it's just adding in SQS where it breaks. We need to publish a new version of the AWS integration with the endpoint field exposed on the relevant AWS integrations which is tracked here

Proposed Fixes for Main: https://github.com/elastic/beats/pull/39722

Fixes for 8.14:

  • [x] Fix saving broken region to the configuration when using a custom endpoint with SQS queue_url. I've fixed here on top of 8.14 but it is separately already fixed on Main. Thanks @faec!
  • [x] Fix handling of default_region. Not fixed on 8.14 but fixed on Main. Thanks @faec!
  • [x] Fix exception when we can parse the URL from the queue_url, there is no region in the config, and there's a region mismatch in the parsing. I've fixed here on top of 8.14 but it is separately already fixed on Main. Thanks @faec!
  • [x] Fix parsing regionname from custom endpoint
  • [x] Fix failing region parsing if default_region is set but region is not. I've fixed here on top of 8.14 but it is separately already fixed on Main. Thanks @faec!
  • [x] Use the default endpoint resolver if the endpoint begins with s3

Optional for 8.14:

  • [x] Keep the current behavior (overwriting every service to use the Endpoint value) when the endpoint does not begin with s3

Limit the scope of the endpoint resolver:

  1. When users provide us an endpoint that begins with S3, do not set an Endpoint Resolver but set the Endpoint field so the Default resolver can generate URLs using the Endpoint but with a unique domain for each service (sqs.us-east-1, dynamodb.us-east-1, s3.us-east-1, ...)
  2. When users provide us an endpoint that doesn't begin with S3, use the exact same URL AS-IS for every single service (sqs endpoint = endpoint, s3 endpoint = endpoint). This allows this to be backwards compatible.

Checklist

  • [ ] My code follows the style guidelines of this project
  • [x] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • [x] I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

Hopefully none.

The entire addition of getRegionFromQueueURL to handle custom endpoints can be removed and the user would just have to manually specify a region. Which would make this a bit smaller.

How to test this PR locally

Login to AWS CLI, provide the following in a filebeat config

filebeat.inputs:
- type: aws-s3
  queue_url: https://sqs.us-east-1.amazonaws.com/123123123123123/queue_path
  number_of_workers: 1
  region: us-east-1
  endpoint: https://s3.us-east-1.amazonaws.com

See that the SQS ReceiveMessage works and you can publish an item to the bucket and get a result

filebeat.inputs:
- type: aws-s3
  queue_url: https://sqs.us-east-1.amazonaws.com/123123123123123/queue_path
  number_of_workers: 1
  endpoint: https://s3.us-east-1.amazonaws.com

See that the SQS ReceiveMessage works as the region is inferred from the queue_url matching the endpoint, and you can publish an item to the bucket and get a result

filebeat.inputs:
- type: aws-s3
  queue_url: https://sqs.us-east-1.amazonaws.com/123123123123123/queue_path
  number_of_workers: 1

See that the SQS ReceiveMessage works as the region is inferred from the queue_url matching the endpoint, and you can publish an item to the bucket and get a result

See that the following fails:

- type: aws-s3
  queue_url: https://sqs.us-east-1.amazonaws.com/946960629917/billeaston-s3-queue
  number_of_workers: 1
  endpoint: https://us-east-1.amazonaws.com
{"log.level":"warn","@timestamp":"2024-05-23T23:23:17.585-0500","log.logger":"input.aws-s3","log.origin":{"function":"github.com/elastic/beats/v7/x-pack/filebeat/input/awss3.(*s3Input).Run","file.name":"awss3/input.go","file.line":132},"message":"configured region disagrees with queue_url region: \"localtest\" != \"amazonaws\": using \"\"","service.name":"filebeat","id":"43D90D58192992F9","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-05-23T23:23:25.801-0500","log.logger":"input.aws-s3.sqs","log.origin":{"function":"github.com/elastic/beats/v7/x-pack/filebeat/input/awss3.(*sqsReader).Receive","file.name":"awss3/sqs.go","file.line":68},"message":"SQS ReceiveMessage returned an error. Will retry after a short delay.","service.name":"filebeat","id":"43D90D58192992F9","queue_url":"https://sqs.localtest.amazonaws.com/946960629917/billeaston-s3-queue","error":{"message":"sqs ReceiveMessage failed: operation error SQS: ReceiveMessage, https response error StatusCode: 0, RequestID: , request send failed, Post \"https://localtest.amazonaws.com/\": dial tcp: lookup localtest.amazonaws.com: no such host"},"ecs.version":"1.6.0"}

> lookup localtest.amazonaws.com: no such host

See that the following works but fails to connect (no such host)

filebeat.inputs:
- type: aws-s3
  queue_url: https://sqs.localtest.abc.xyz/946960629917/billeaston-s3-queue
  number_of_workers: 1
  region: localtest
  endpoint: https://s3.localtest.abc.xyz
{"log.level":"warn","@timestamp":"2024-05-23T23:24:38.825-0500","log.logger":"input.aws-s3.sqs","log.origin":{"function":"github.com/elastic/beats/v7/x-pack/filebeat/input/awss3.(*sqsReader).Receive","file.name":"awss3/sqs.go","file.line":68},"message":"SQS ReceiveMessage returned an error. Will retry after a short delay.","service.name":"filebeat","id":"56FBB4DE51C84BB9","queue_url":"https://sqs.localtest.abc.xyz/946960629917/billeaston-s3-queue","error":{"message":"sqs ReceiveMessage failed: operation error SQS: ReceiveMessage, https response error StatusCode: 0, RequestID: , request send failed, Post \"https://sqs.localtest.amazonaws.com/\": dial tcp: lookup sqs.localtest.amazonaws.com: no such host"},"ecs.version":"1.6.0"}

See that endpoint is s3...... but the failure message says sqs.localtest.amazonaws.com

Use cases

Allow users who use custom-but-AWS domains to enjoy the benefits of S3 and SQS together.

strawgate avatar May 24 '24 02:05 strawgate

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

elasticmachine avatar May 24 '24 02:05 elasticmachine

@cmacknz added additional tests

strawgate avatar May 25 '24 00:05 strawgate

@cmacknz added additional tests

To have CI actually trigger the tests you need to add the aws label, otherwise they don't run.

alexsapran avatar May 27 '24 12:05 alexsapran

/test

alexsapran avatar May 27 '24 13:05 alexsapran

@andresrc @zmoog @bturquet can we please get an approval from the obs-cloud-monitoring team?

jlind23 avatar May 28 '24 17:05 jlind23