thanos icon indicating copy to clipboard operation
thanos copied to clipboard

Redis client can't connect to server

Open Supporterino opened this issue 2 years ago • 11 comments

Discussed in https://github.com/thanos-io/thanos/discussions/6013

Originally posted by Supporterino January 3, 2023 Hello guys,

I am just updating my thanos stack to v0.30.0 and want to switch over to redis as the cache provider. I set up a redis cluster on version v7 with the bitnami helm chart. I am using the following cache configuration (as example query-range cache):

config:
  addr: "redis-redis-cluster-0.redis-redis-cluster-headless:6379,redis-redis-cluster-1.redis-redis-cluster-headless:6379,redis-redis-cluster-2.redis-redis-cluster-headless:6379"
  password: "SECURE-PASSWORD"
  db: 0
  dial_timeout: 5s
  read_timeout: 3s
  write_timeout: 3s
  pool_size: 100
  min_idle_conns: 10
  idle_timeout: 5m0s
  max_conn_age: 0s
  max_get_multi_concurrency: 100
  get_multi_batch_size: 100
  max_set_multi_concurrency: 100
  set_multi_batch_size: 100
  tls_enabled: false
  cache_size: 1GiB
type: "REDIS"

But my redis instance isn't getting any load and the query frontend just logs the following:

level=error ts=2023-01-03T10:41:45.346637454Z caller=redis_cache.go:46 msg="error connecting to redis" name=redis err="got 4 elements in cluster info address, expected 2 or 3"
level=info ts=2023-01-03T10:41:45.347196342Z caller=query_frontend.go:339 msg="starting query frontend"
level=info ts=2023-01-03T10:41:45.347215738Z caller=intrumentation.go:56 msg="changing probe status" status=ready
level=info ts=2023-01-03T10:41:45.347303382Z caller=intrumentation.go:75 msg="changing probe status" status=healthy
level=info ts=2023-01-03T10:41:45.34734474Z caller=http.go:73 service=http/server component=query-frontend msg="listening for requests and metrics" address=0.0.0.0:9090
level=info ts=2023-01-03T10:41:45.347625667Z caller=tls_config.go:232 service=http/server component=query-frontend msg="Listening on" address=[::]:9090
level=info ts=2023-01-03T10:41:45.347645684Z caller=tls_config.go:235 service=http/server component=query-frontend msg="TLS is disabled." http2=false address=[::]:9090
level=error ts=2023-01-03T10:45:24.948875993Z caller=redis_cache.go:75 msg="failed to get from redis" name=redis err="got 4 elements in cluster info address, expected 2 or 3"
level=error ts=2023-01-03T10:45:25.407642957Z caller=redis_cache.go:103 msg="failed to put to redis" name=redis err="got 4 elements in cluster info address, expected 2 or 3"
level=error ts=2023-01-03T10:45:25.426421268Z caller=redis_cache.go:75 msg="failed to get from redis" name=redis err="got 4 elements in cluster info address, expected 2 or 3"
level=error ts=2023-01-03T10:45:25.519582802Z caller=redis_cache.go:75 msg="failed to get from redis" name=redis err="got 4 elements in cluster info address, expected 2 or 3"
level=error ts=2023-01-03T10:45:25.612233941Z caller=redis_cache.go:75 msg="failed to get from redis" name=redis err="got 4 elements in cluster info address, expected 2 or 3"
level=error ts=2023-01-03T10:45:25.709052914Z caller=redis_cache.go:75 msg="failed to get from redis" name=redis err="got 4 elements in cluster info address, expected 2 or 3"

What exactly am I missing ?

Supporterino avatar Jan 05 '23 06:01 Supporterino

Answered this in that discussion post. It is related to https://github.com/go-redis/redis/issues/2085. For redis version v7 we should use go-redis version v9 instead while we are using go-redis v8.

For this usecase we should upgrade go-redis library. But it looks like it is not backward compatible so if we upgrade it it breaks redis 6.x.

yeya24 avatar Jan 05 '23 06:01 yeya24

Temporarly I downgraded your redis cluister to 6.2.8 since it is only used for thanos. Now it is working like a charm. Ty for your help. It might be useful to make a little note at the redis cache section maybe

Supporterino avatar Jan 09 '23 07:01 Supporterino

Why does the Store Gateway works w/ Redis 7 and the Query Frontend does not?

kforsthoevel avatar Jan 31 '23 10:01 kforsthoevel

Would it be an option to use this Redis client (https://github.com/rueian/rueidis) via the cacheutils internal package?

Schmitze333 avatar Jan 31 '23 10:01 Schmitze333

Yeah it would be great to use the same rueidis client in query frontend redis cache as well.

yeya24 avatar Feb 01 '23 20:02 yeya24

@yeya24 I'm working on a PR targeting the use of rueidis also as Redis client in the query-frontend, but something puzzles me with regard to the Redis configs. I wonder whether this issue is the right place to discuss or rather WIP PR.

Schmitze333 avatar Feb 17 '23 12:02 Schmitze333

Hi,

Recently tried to enable redis cache for query-frontend component - it failed with this error:

{"caller":"redis_cache.go:75","err":"ERR unknown command 'select', with args beginning with: '1' ","level":"error","msg":"failed to get from redis","name":"redis","ts":"2023-06-07T15:58:53.129616978Z"}

@douglascamata suggested there might be incompatibility between client and server, so I tried these redis versions but none of them succeeded (same error):

  • 7.0.8
  • 6.2.12
  • 6.0.19

I was unable to test with <6.0 because to operator I'm using to deploy redis to k8s is not supporting such old versions ;)

Thanos 0.31.0

michalschott avatar Jun 08 '23 09:06 michalschott

After some back and forth with @michalschott in Slack, he found out that most of his problems come from using a Redis Cluster for HA.

So for anyone out there using Redis Cluster: you have to leave the DB unset, otherwise it'll fail with an error like so: "ERR SELECT is not allowed in cluster mode", which comes from the DB selection command.

douglascamata avatar Jun 13 '23 13:06 douglascamata

I get errors using a v6 redis cluster with query frontend even with the db unset. Example errors

msg="failed to get from redis" name=redis err="MOVED 10784 10.0.200.62:6379"
msg="failed to put to redis" name=redis err="EXECABORT Transaction discarded because of previous errors."

This occurs when pointing query frontend at the same AWS Elasticache cluster I use for the store component. Happy to open a separate issue if needed.

dschaaff avatar Jun 24 '23 17:06 dschaaff

Hey folks, can you try again after https://github.com/thanos-io/thanos/pull/6520 got merged? Should be fixed, I believe.

douglascamata avatar Jul 13 '23 10:07 douglascamata

Hey folks, can you try again after #6520 got merged? Should be fixed, I believe.

not working for me with exact same config for store gateway

calvinbui avatar Apr 09 '24 01:04 calvinbui