mimir
mimir copied to clipboard
S3 backend does not work with VPC Interface Endpoints (dualstack not supported)
Describe the bug
I file this as a software bug though for now it could be considered a documentation bug because it can't be fixed until minio/minio-go#1766 is resolved: The library auto-detects AWS endpoints and force-rewrites them to their dualstack variant which is surprising and does not work in our context.
We are deploying Mimir in an enterprise environment. One of the policies (enforced by some central firewalling and/our routing) is that we can't connect to any public IPv4 addresses but must stay within the RFC1918 IP range. This includes the public AWS endpoints.
To be able to use them we deploy VPC Interface Endpoints and enable the Private DNS option which will override s3.region.amazonaws.com in the local resolver and point it to the local endpoint. But S3 VPC Endpoints do not support dualstack yet so the DNS name s3.dualstack.region.amazonaws.com is not overridden.
Our current workaround is to use the Regional Names. So this works (but is complicated to maintain across accounts):
common:
storage:
backend: s3
s3:
endpoint: bucket.vpce-0c890e7af692eca77-54c6308e.s3.eu-central-1.vpce.amazonaws.com
bucket_name: example-mimir-int-main
region: eu-central-1
But this doesn't:
common:
storage:
backend: s3
s3:
endpoint: s3.eu-central-1.amazonaws.com
bucket_name: example-mimir-int-main
region: eu-central-1
To Reproduce
I think these steps should be enough to emulate our environment:
- Prepare your VPC (subnets, resolvers, ...)
- Create a security group which allows communication within the VPC CIDR range only
- Create an S3 VPC Endpoint with Private DNS enabled
- Create an EC2 instance, install Mimir
- Configure Mimir to use an S3 storage backend using the endpoint name
s3.eu-central-1.amazonaws.com
(adopt to your region) - Start Mimir (2.11.0)
Expected behavior
Mimir should resolve s3.eu-central-1.amazonaws.com
to an IP address within the VPC and start up.
Environment
- Infrastructure: AWS EC2
- Deployment tool: Ansible
Additional Context
Feb 14 13:08:40 i-0cb71db391d615e9f.eu-central-1.compute.internal mimir[119162]: ts=2024-02-14T13:08:40.538637541Z caller=memberlist_client.go:561 level=info msg="memberlist fast-join finished" joined_nodes=3 elapsed_time=131.174784ms
Feb 14 13:08:40 i-0cb71db391d615e9f.eu-central-1.compute.internal mimir[119162]: ts=2024-02-14T13:08:40.540366115Z caller=memberlist_client.go:573 level=info phase=startup msg="joining memberlist cluster" join_members=example-mimir-int-0.example-sandbox-1.eu-central-1.aws.example.com,example-mimir-int-1.example-sandbox-1.eu-central-1.aws.example.com,example-mimir-int-2.example-sandbox-1.eu-central-1.aws.example.com
Feb 14 13:08:40 i-0cb71db391d615e9f.eu-central-1.compute.internal mimir[119162]: ts=2024-02-14T13:08:40.721495949Z caller=memberlist_client.go:580 level=info phase=startup msg="joining memberlist cluster succeeded" reached_nodes=3 elapsed_time=181.04815ms
Feb 14 13:09:10 i-0cb71db391d615e9f.eu-central-1.compute.internal mimir[119162]: ts=2024-02-14T13:09:10.434945832Z caller=sanity_check.go:115 level=warn msg="Unable to successfully connect to configured object storage (will retry)" err="3 errors: blocks storage: unable to successfully send a request to object storage: Get \"https://example-mimir-int-main.s3.dualstack.eu-central-1.amazonaws.com/blocks/sanity-check-at-startup\": context deadline exceeded; alertmanager storage: unable to successfully send a request to object storage: Get \"https://example-mimir-int-main.s3.dualstack.eu-central-1.amazonaws.com/sanity-check-at-startup\": context deadline exceeded; ruler storage: unable to successfully send a request to object storage: Get \"https://example-mimir-int-main.s3.dualstack.eu-central-1.amazonaws.com/sanity-check-at-startup\": context deadline exceeded"
BTW: The odd behaviour of the library might also have been the cause for #6018 and was most definitely the cause for #5878. That the endpoint name is rewritten behind ones back (even using a fixed lookup table to do so) should at least be documented.
The upstream issue minio/minio-go#1766 was fixed and a new option Client.SetS3EnableDualstack
which defaults to true
was added.