fred.rs icon indicating copy to clipboard operation
fred.rs copied to clipboard

[Bug] `.replicas()` doesn't work with centralized Redis

Open banool opened this issue 1 year ago • 3 comments

Fred version - 9.0.3 Redis version - 7.0.14 Platform - mac Deployment type - centralized

Describe the bug Centralized client doesn't work with .replicas().

To Reproduce I have code like this:

let options = self.redis_pool.options();
let value: Option<String> = self
    .redis_pool
    // We don't care if the data from the cache reader is a bit stale, it is
    // a good candidate for using replicas instead. If there are no replicas,
    // the client will just use a normal node.
    .replicas()
    // We need to reapply the options because `.replicas()` does not keep the
    // WithOptions wrapper: https://github.com/aembke/fred.rs/issues/256.
    .with_options(options)
    .get(redis_key)
    .await
    .inspect_err(|e| error!(error = ?e, key=redacted_key, "Failed to get key from Redis"))
    .with_context(|| format!("Failed to get key {}", redacted_key))?;

Depending on some prior configuration, I build the pool with either from_url_clustered or from_url_centralized. When I use a client configured for centralized Redis, .replicas() hangs until the configured timeout and then returns an error like this:

Redis Error - kind: Timeout, details: Request timed out.

With a clustered mode client, .replicas() says All commands sent via this interface will use a replica node, if possible. I suppose it makes sense that replicas() doesn't work with normal Redis (since it's a clustered concept) but it'd be nice if, with a centralized mode client, .replicas() would just be a no op. Or .replicas() just wouldn't be possible to use with a centralized client (though that might be tricky given there is only a single client class). Currently it's a bit of a footgun.

Logs With that tracing level I didn't see any additional logging. It was probably misconfigured, I can try again to get the logs if you need them.

banool avatar Jul 05 '24 15:07 banool

Can you show the ReplicaConfig that you used to repro the timeout? There are integration tests for this that run without issues against a primary - replica pair when not in cluster mode. Is it possible your replica node was just unreachable at the time?

aembke avatar Jul 08 '24 20:07 aembke

How about a primary with no replica, do we expect that to work? If so, I can do further testing locally to verify again.

banool avatar Jul 08 '24 22:07 banool

Yeah that should work as well. When the client connects to the primary it calls ROLE to inspect any replicas, and if none are found it creates an empty replica routing table. If you try to use replicas() when there's no records in the routing table then it should succeed or fail based on whether primary_fallback is true/false, but I wouldn't expect to see a timeout either way.

aembke avatar Jul 08 '24 22:07 aembke