fred.rs icon indicating copy to clipboard operation
fred.rs copied to clipboard

[Bug] refreshing cluster slot owners after failed cluster is recovered

Open to266 opened this issue 1 year ago • 3 comments

Redis version - 6.2.7 Platform - linux Using Docker and/or Kubernetes - yes Deployment type - cluster

Describe the bug If the cluster is in genuinely misconfigured / failed state (which I'm not sure how to reproduce, and would generally like to avoid altogether) but then recovers, the fred clients are not able to (I presume) refresh the cluster slots distribution when cluster is back healthy (whether it recovered, or got completely restarted / replaced).

As we retry on failed connections, all I can see are errors like below

Logs

"timestamp":"2023-12-26T09:27:36.487891Z","level":"WARN","fields":{"message":"fred-G4vFWfuJWz: Possible cluster misconfiguration. Missing hash slot owner for Some(6606).","log.target":"fred::router::clustered","log.module_path":"fred::router::clustered","log.file":"/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/fred-6.2.1/src/router/clustered.rs","log.line":88},"target":"fred::router::clustered","threadName":"tokio-runtime-worker"

Additional context Add any other context about the problem here.

to266 avatar Dec 26 '23 09:12 to266

Hi @to266 , can you try with 7.1.1?

aembke avatar Jan 09 '24 05:01 aembke

Will do, but:

  • Not sure how quickly we'll manage to update fred in our repo in the first place
  • We will likely only see a similar level of load at the end of the month - so until then it should all be good regardless.

Having said that, thanks!

to266 avatar Jan 09 '24 09:01 to266

I'd recommend trying 7.1.2 if you can. That release contains a fix for a similar kind of issue.

aembke avatar Jan 20 '24 17:01 aembke

Closing due to inactivity, but if you still have issues here after 8.0.2 please let me know. There were ~5 potentially relevant fixes for this between 6.3.2 and 8.0.2, so hopefully those address this.

aembke avatar Feb 15 '24 22:02 aembke