thanos
thanos copied to clipboard
Troubleshoot connecting Thanos QueryFrontend to AWS ElasticCache Redis with TLS
Connection issue when trying to connect Thanos QueryFrontend to an AWS ElasticCache Redis with TLS enabled.
Thanos, Prometheus and Golang version used:
Thanos v0.32.2 with AWS ElasticCache Redis 6.2.6 with Encryption in transit enabled
This is the configuration used:
--query-range.response-cache-config=
config:
addr: XXX.cache.amazonaws.com:6379
tls_enabled: true
type: "redis"
Object Storage Provider: AWS
What happened:
When I tried to connect to AWS ElasticCache Redis cluster with TLS in transit, I got a connection issue: context deadline exceeded
.
I think it is because of missing root certificates because when I used a alpine image and install the root certificates which include Amazon_Root_CA it worked well.
redis-cli XXX.cache.amazonaws.com -p 6379 --tls
I tried to add those certificates with an initContainer but I got the same connection issue.
What you expected to happen:
Connect successfully Thanos QueryFrontend to ElasticCache Redis cluster with TLS.
Full logs to relevant components:
ts=2023-09-18T16:59:05.422890922Z caller=main.go:135 level=error err="creating redis client: context deadline exceeded\ngithub.com/thanos-io/thanos/internal/cortex/chunk/cache.New\n\t/app/internal/cortex/chunk/cache/cache.go:108\ngithub.com/thanos-io/thanos/internal/cortex/querier/queryrange.NewResultsCacheMiddleware\n\t/app/internal/cortex/querier/queryrange/results_cache.go:187\ngithub.com/thanos-io/thanos/pkg/queryfrontend.newQueryRangeTripperware\n\t/app/pkg/queryfrontend/roundtrip.go:199\ngithub.com/thanos-io/thanos/pkg/queryfrontend.NewTripperware\n\t/app/pkg/queryfrontend/roundtrip.go:58\nmain.runQueryFrontend\n\t/app/cmd/thanos/query_frontend.go:254\nmain.registerQueryFrontend.func1\n\t/app/cmd/thanos/query_frontend.go:160\nmain.main\n\t/app/cmd/thanos/main.go:133\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\ncreate results cache middleware\ngithub.com/thanos-io/thanos/pkg/queryfrontend.newQueryRangeTripperware\n\t/app/pkg/queryfrontend/roundtrip.go:211\ngithub.com/thanos-io/thanos/pkg/queryfrontend.NewTripperware\n\t/app/pkg/queryfrontend/roundtrip.go:58\nmain.runQueryFrontend\n\t/app/cmd/thanos/query_frontend.go:254\nmain.registerQueryFrontend.func1\n\t/app/cmd/thanos/query_frontend.go:160\nmain.main\n\t/app/cmd/thanos/main.go:133\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\nsetup tripperwares\nmain.runQueryFrontend\n\t/app/cmd/thanos/query_frontend.go:256\nmain.registerQueryFrontend.func1\n\t/app/cmd/thanos/query_frontend.go:160\nmain.main\n\t/app/cmd/thanos/main.go:133\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\npreparing query-frontend command failed\nmain.main\n\t/app/cmd/thanos/main.go:135\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598"
The Redis cache configurations accepts the CA file as configuration, see the official docs: https://thanos.io/tip/components/store.md/#redis-index-cache
Thanks for the aswer, I have already tried this, with ca-certificates lib there are 4 Amazon Root certificates, I merged them into one unique file and add the path into tls_config :
--query-range.response-cache-config=
config:
addr: XXX.cache.amazonaws.com:6379
tls_enabled: true
tls_config:
ca_file: /etc/ssl/certs/ca-cert-Amazon_Root_CA.pem
type: "redis"
I got the same issue: creating redis client: context deadline exceeded
You can add use the insecure option to skip the cert check. 👀
Otherwise unfortunately I can't help anymore, I don't have this kind of setup.
I tried also to skip the option : insecure_skip_verify: true
it doesn't seem to have any impact on my issue
Did you try already without in transit encryption? This seems weird... it's like a timeout somewhere
Yes for the moment we use ElasticCache Redis without TLS encryption it works well
I'm also experiencing this issue. I have redis configured with TLS, but query-frontend cannot connect to it.
I dug into the code a bit, and from what I can tell I don't think TLS has been implemented yet? https://github.com/thanos-io/thanos/blob/main/pkg/queryfrontend/config.go#L162
The NewCacheConfig parser for Redis doesn't seem to pass any TLS options over to the cortex cache config, so it doesn't get enabled.
Just doing a quick test, made the following change:
diff --git a/pkg/queryfrontend/config.go b/pkg/queryfrontend/config.go
index a5655199..80e7f3f0 100644
--- a/pkg/queryfrontend/config.go
+++ b/pkg/queryfrontend/config.go
@@ -166,6 +166,8 @@ func NewCacheConfig(logger log.Logger, confContentYaml []byte) (*cortexcache.Con
Expiration: config.Expiration,
DB: config.Redis.DB,
Password: flagext.Secret{Value: config.Redis.Password},
+ EnableTLS: true,
+ InsecureSkipVerify: true,
},
Background: cortexcache.BackgroundConfig{
WriteBackBuffer: config.Redis.MaxSetMultiConcurrency * config.Redis.SetMultiBatchSize,
recompiled, and query-frontend is able to connect to my Redis using TLS.
Unfortunately my Golang skills are quire limited so I don't know how to fix this properly.
Would this then categorize as a bug since the insecure_skip_verify
and tls_enabled
is not passed down during convert to the cortexcache.RedisConfig
?