thanos icon indicating copy to clipboard operation
thanos copied to clipboard

Troubleshoot connecting Thanos QueryFrontend to AWS ElasticCache Redis with TLS

Open vaillani opened this issue 1 year ago • 9 comments

Connection issue when trying to connect Thanos QueryFrontend to an AWS ElasticCache Redis with TLS enabled.

Thanos, Prometheus and Golang version used:

Thanos v0.32.2 with AWS ElasticCache Redis 6.2.6 with Encryption in transit enabled

This is the configuration used:

--query-range.response-cache-config=
    config:
        addr: XXX.cache.amazonaws.com:6379
        tls_enabled: true

    type: "redis"

Object Storage Provider: AWS

What happened:

When I tried to connect to AWS ElasticCache Redis cluster with TLS in transit, I got a connection issue: context deadline exceeded.

I think it is because of missing root certificates because when I used a alpine image and install the root certificates which include Amazon_Root_CA it worked well.

redis-cli XXX.cache.amazonaws.com -p 6379 --tls

I tried to add those certificates with an initContainer but I got the same connection issue.

What you expected to happen:

Connect successfully Thanos QueryFrontend to ElasticCache Redis cluster with TLS.

Full logs to relevant components:

ts=2023-09-18T16:59:05.422890922Z caller=main.go:135 level=error err="creating redis client: context deadline exceeded\ngithub.com/thanos-io/thanos/internal/cortex/chunk/cache.New\n\t/app/internal/cortex/chunk/cache/cache.go:108\ngithub.com/thanos-io/thanos/internal/cortex/querier/queryrange.NewResultsCacheMiddleware\n\t/app/internal/cortex/querier/queryrange/results_cache.go:187\ngithub.com/thanos-io/thanos/pkg/queryfrontend.newQueryRangeTripperware\n\t/app/pkg/queryfrontend/roundtrip.go:199\ngithub.com/thanos-io/thanos/pkg/queryfrontend.NewTripperware\n\t/app/pkg/queryfrontend/roundtrip.go:58\nmain.runQueryFrontend\n\t/app/cmd/thanos/query_frontend.go:254\nmain.registerQueryFrontend.func1\n\t/app/cmd/thanos/query_frontend.go:160\nmain.main\n\t/app/cmd/thanos/main.go:133\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\ncreate results cache middleware\ngithub.com/thanos-io/thanos/pkg/queryfrontend.newQueryRangeTripperware\n\t/app/pkg/queryfrontend/roundtrip.go:211\ngithub.com/thanos-io/thanos/pkg/queryfrontend.NewTripperware\n\t/app/pkg/queryfrontend/roundtrip.go:58\nmain.runQueryFrontend\n\t/app/cmd/thanos/query_frontend.go:254\nmain.registerQueryFrontend.func1\n\t/app/cmd/thanos/query_frontend.go:160\nmain.main\n\t/app/cmd/thanos/main.go:133\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\nsetup tripperwares\nmain.runQueryFrontend\n\t/app/cmd/thanos/query_frontend.go:256\nmain.registerQueryFrontend.func1\n\t/app/cmd/thanos/query_frontend.go:160\nmain.main\n\t/app/cmd/thanos/main.go:133\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\npreparing query-frontend command failed\nmain.main\n\t/app/cmd/thanos/main.go:135\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598"

vaillani avatar Sep 19 '23 07:09 vaillani

The Redis cache configurations accepts the CA file as configuration, see the official docs: https://thanos.io/tip/components/store.md/#redis-index-cache

douglascamata avatar Sep 19 '23 10:09 douglascamata

Thanks for the aswer, I have already tried this, with ca-certificates lib there are 4 Amazon Root certificates, I merged them into one unique file and add the path into tls_config :

--query-range.response-cache-config=
    config:
          addr: XXX.cache.amazonaws.com:6379
          tls_enabled: true
          tls_config:
              ca_file: /etc/ssl/certs/ca-cert-Amazon_Root_CA.pem
    type: "redis"

I got the same issue: creating redis client: context deadline exceeded

vaillani avatar Sep 19 '23 12:09 vaillani

You can add use the insecure option to skip the cert check. 👀

Otherwise unfortunately I can't help anymore, I don't have this kind of setup.

douglascamata avatar Sep 19 '23 14:09 douglascamata

I tried also to skip the option : insecure_skip_verify: true it doesn't seem to have any impact on my issue

vaillani avatar Sep 19 '23 14:09 vaillani

Did you try already without in transit encryption? This seems weird... it's like a timeout somewhere

douglascamata avatar Sep 19 '23 14:09 douglascamata

Yes for the moment we use ElasticCache Redis without TLS encryption it works well

vaillani avatar Sep 19 '23 15:09 vaillani

I'm also experiencing this issue. I have redis configured with TLS, but query-frontend cannot connect to it.

I dug into the code a bit, and from what I can tell I don't think TLS has been implemented yet? https://github.com/thanos-io/thanos/blob/main/pkg/queryfrontend/config.go#L162

The NewCacheConfig parser for Redis doesn't seem to pass any TLS options over to the cortex cache config, so it doesn't get enabled.

Just doing a quick test, made the following change:

diff --git a/pkg/queryfrontend/config.go b/pkg/queryfrontend/config.go
index a5655199..80e7f3f0 100644
--- a/pkg/queryfrontend/config.go
+++ b/pkg/queryfrontend/config.go
@@ -166,6 +166,8 @@ func NewCacheConfig(logger log.Logger, confContentYaml []byte) (*cortexcache.Con
                                Expiration: config.Expiration,
                                DB:         config.Redis.DB,
                                Password:   flagext.Secret{Value: config.Redis.Password},
+                               EnableTLS:  true,
+                               InsecureSkipVerify:  true,
                        },
                        Background: cortexcache.BackgroundConfig{
                                WriteBackBuffer:     config.Redis.MaxSetMultiConcurrency * config.Redis.SetMultiBatchSize,

recompiled, and query-frontend is able to connect to my Redis using TLS.

Unfortunately my Golang skills are quire limited so I don't know how to fix this properly.

mhamzahkhan avatar Oct 26 '23 23:10 mhamzahkhan

Would this then categorize as a bug since the insecure_skip_verify and tls_enabled is not passed down during convert to the cortexcache.RedisConfig ?

gnomeria avatar Jan 18 '24 02:01 gnomeria