[8.14] Fix handling of custom Endpoint when using S3 + SQS
Proposed commit message
Fix issues described in https://github.com/elastic/beats/issues/39706 that prevent using a custom endpoint with S3 + SQS.
Users can workaround this issue via S3 bucket polling. The S3 bucket polling still works just fine with a custom endpoint, it's just adding in SQS where it breaks. We need to publish a new version of the AWS integration with the endpoint field exposed on the relevant AWS integrations which is tracked here
Proposed Fixes for Main: https://github.com/elastic/beats/pull/39722
Fixes for 8.14:
- [x] Fix saving broken region to the configuration when using a custom endpoint with SQS queue_url. I've fixed here on top of 8.14 but it is separately already fixed on Main. Thanks @faec!
- [x] Fix handling of default_region. Not fixed on 8.14 but fixed on Main. Thanks @faec!
- [x] Fix exception when we can parse the URL from the queue_url, there is no region in the config, and there's a region mismatch in the parsing. I've fixed here on top of 8.14 but it is separately already fixed on Main. Thanks @faec!
- [x] Fix parsing regionname from custom endpoint
- [x] Fix failing region parsing if default_region is set but region is not. I've fixed here on top of 8.14 but it is separately already fixed on Main. Thanks @faec!
- [x] Use the default endpoint resolver if the endpoint begins with
s3
Optional for 8.14:
- [x] Keep the current behavior (overwriting every service to use the Endpoint value) when the endpoint does not begin with
s3
Limit the scope of the endpoint resolver:
- When users provide us an endpoint that begins with S3, do not set an Endpoint Resolver but set the Endpoint field so the Default resolver can generate URLs using the Endpoint but with a unique domain for each service (sqs.us-east-1, dynamodb.us-east-1, s3.us-east-1, ...)
- When users provide us an endpoint that doesn't begin with S3, use the exact same URL AS-IS for every single service (sqs endpoint = endpoint, s3 endpoint = endpoint). This allows this to be backwards compatible.
Checklist
- [ ] My code follows the style guidelines of this project
- [x] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] I have made corresponding change to the default configuration files
- [x] I have added tests that prove my fix is effective or that my feature works
- [ ] I have added an entry in
CHANGELOG.next.asciidocorCHANGELOG-developer.next.asciidoc.
Disruptive User Impact
Hopefully none.
The entire addition of getRegionFromQueueURL to handle custom endpoints can be removed and the user would just have to manually specify a region. Which would make this a bit smaller.
How to test this PR locally
Login to AWS CLI, provide the following in a filebeat config
filebeat.inputs:
- type: aws-s3
queue_url: https://sqs.us-east-1.amazonaws.com/123123123123123/queue_path
number_of_workers: 1
region: us-east-1
endpoint: https://s3.us-east-1.amazonaws.com
See that the SQS ReceiveMessage works and you can publish an item to the bucket and get a result
filebeat.inputs:
- type: aws-s3
queue_url: https://sqs.us-east-1.amazonaws.com/123123123123123/queue_path
number_of_workers: 1
endpoint: https://s3.us-east-1.amazonaws.com
See that the SQS ReceiveMessage works as the region is inferred from the queue_url matching the endpoint, and you can publish an item to the bucket and get a result
filebeat.inputs:
- type: aws-s3
queue_url: https://sqs.us-east-1.amazonaws.com/123123123123123/queue_path
number_of_workers: 1
See that the SQS ReceiveMessage works as the region is inferred from the queue_url matching the endpoint, and you can publish an item to the bucket and get a result
See that the following fails:
- type: aws-s3
queue_url: https://sqs.us-east-1.amazonaws.com/946960629917/billeaston-s3-queue
number_of_workers: 1
endpoint: https://us-east-1.amazonaws.com
{"log.level":"warn","@timestamp":"2024-05-23T23:23:17.585-0500","log.logger":"input.aws-s3","log.origin":{"function":"github.com/elastic/beats/v7/x-pack/filebeat/input/awss3.(*s3Input).Run","file.name":"awss3/input.go","file.line":132},"message":"configured region disagrees with queue_url region: \"localtest\" != \"amazonaws\": using \"\"","service.name":"filebeat","id":"43D90D58192992F9","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-05-23T23:23:25.801-0500","log.logger":"input.aws-s3.sqs","log.origin":{"function":"github.com/elastic/beats/v7/x-pack/filebeat/input/awss3.(*sqsReader).Receive","file.name":"awss3/sqs.go","file.line":68},"message":"SQS ReceiveMessage returned an error. Will retry after a short delay.","service.name":"filebeat","id":"43D90D58192992F9","queue_url":"https://sqs.localtest.amazonaws.com/946960629917/billeaston-s3-queue","error":{"message":"sqs ReceiveMessage failed: operation error SQS: ReceiveMessage, https response error StatusCode: 0, RequestID: , request send failed, Post \"https://localtest.amazonaws.com/\": dial tcp: lookup localtest.amazonaws.com: no such host"},"ecs.version":"1.6.0"}
> lookup localtest.amazonaws.com: no such host
See that the following works but fails to connect (no such host)
filebeat.inputs:
- type: aws-s3
queue_url: https://sqs.localtest.abc.xyz/946960629917/billeaston-s3-queue
number_of_workers: 1
region: localtest
endpoint: https://s3.localtest.abc.xyz
{"log.level":"warn","@timestamp":"2024-05-23T23:24:38.825-0500","log.logger":"input.aws-s3.sqs","log.origin":{"function":"github.com/elastic/beats/v7/x-pack/filebeat/input/awss3.(*sqsReader).Receive","file.name":"awss3/sqs.go","file.line":68},"message":"SQS ReceiveMessage returned an error. Will retry after a short delay.","service.name":"filebeat","id":"56FBB4DE51C84BB9","queue_url":"https://sqs.localtest.abc.xyz/946960629917/billeaston-s3-queue","error":{"message":"sqs ReceiveMessage failed: operation error SQS: ReceiveMessage, https response error StatusCode: 0, RequestID: , request send failed, Post \"https://sqs.localtest.amazonaws.com/\": dial tcp: lookup sqs.localtest.amazonaws.com: no such host"},"ecs.version":"1.6.0"}
See that endpoint is s3...... but the failure message says sqs.localtest.amazonaws.com
Use cases
Allow users who use custom-but-AWS domains to enjoy the benefits of S3 and SQS together.
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)
@cmacknz added additional tests
@cmacknz added additional tests
To have CI actually trigger the tests you need to add the aws label, otherwise they don't run.
/test
@andresrc @zmoog @bturquet can we please get an approval from the obs-cloud-monitoring team?