lettuce icon indicating copy to clipboard operation
lettuce copied to clipboard

`[MasterReplica] ReadFrom.REPLICA_PREFERRED fails entirely when first replica is unreachable, even before Sentinel marks it +sdown`

Open xiongrl opened this issue 2 months ago • 2 comments

Title:
[MasterReplica] ReadFrom.REPLICA_PREFERRED fails entirely when first replica is unreachable, even before Sentinel marks it +sdown

Labels (suggested): bug, area:master-replica, area:sentinel, status:triage


Describe the bug

When using ReadFrom.REPLICA_PREFERRED with Sentinel-managed Master-Replica topology, if the first selected replica becomes unreachable (e.g. network partition, container restart), MasterReplicaConnectionProvider#getConnectionAsync(ConnectionIntent.READ) fails immediately with RedisConnectionException, even though:

  • Other replicas are healthy and responsive.
  • Sentinel has not yet marked the failed replica as +sdown (subjective down).

This creates a critical availability gap: the client gives up on reading before the cluster coordination layer (Sentinel) has even declared the node down, defeating the purpose of high-availability read scaling.


Expected behavior

The Lettuce client should:

  1. Tolerate transient or partial node failures during connection selection.
  2. Continue attempting other replicas in the REPLICA_PREFERRED selection list.
  3. Only fall back to master if all replicas fail to connect.
  4. Respect Sentinel’s eventual consistency — do not preemptively fail reads just because one node is slow to respond.

In short: Client-level connection should be more resilient than Sentinel’s quorum-based failure detection.


Current behavior

for (RedisNodeDescription node : selection) {
    connections = connections.concatWith(Mono.fromFuture(getConnection(node)));
}

xiongrl avatar Nov 06 '25 10:11 xiongrl

Hey @xiongrl ,

we will get back to you asap, in the meantime if you can provide a minimum reproducible example this would greatly speed up the process, thanks!

tishun avatar Nov 26 '25 08:11 tishun

Hi @tishun ,

Thank you for the quick response! Below is the complete, precise, and 100% reproducible description of the bug I’m seeing in production.

Real Production Symptom (exact scenario)

  • Redis topology: Sentinel HA, 1 master + 2 slaves
  • Lettuce configuration: client.setReadFrom(ReadFrom.REPLICA_PREFERRED)
  • One slave (e.g. 10.15.32.68) experiences a temporary network glitch or is being restarted
  • Sentinel has NOT yet marked it +sdown (still within down-after-milliseconds, usually 30 s)
    SENTINEL slaves mymaster still returns both slaves
    → The other slave is 100% healthy
  • Lettuce topology discovery returns:
    selection = [slave1 (faulty), slave2 (healthy)]
    (slave1 ranks first because of nodeId alphabetical ordering)
  • Any read command instantly throws:

io.lettuce.core.RedisConnectionException: Unable to connect to 10.15.32.68:6379 at io.lettuce.core.masterreplica.MasterReplicaConnectionProvider.getConnectionAsync(MasterReplicaConnectionProvider.java:...)

Result: Even though a perfectly healthy slave exists and Sentinel has not declared the first one down, all reads fail immediately. This creates a severe availability gap during rolling restarts, network flaps, etc.

Root Cause — Confirmed in Lettuce 6.3.2 / 6.4.x Source

ReadFrom.REPLICA_PREFERRED is implemented as:

public static final ReadFrom REPLICA_PREFERRED = new ReadFromImpl.ReadFromReplicaPreferred();

static final class ReadFromReplicaPreferred extends OrderedPredicateReadFromAdapter {
    ReadFromReplicaPreferred() {
        super(IS_REPLICA, IS_UPSTREAM);
    }
}

**Result**: Even though a perfectly healthy slave exists and Sentinel has not declared the first one down, **all reads fail immediately**. This creates a severe availability gap during rolling restarts, network flaps, etc.

if (OrderingReadFromAccessor.isOrderSensitive(readFrom) || selection.size() == 1) {
    return connections.filter(StatefulConnection::isOpen)
                      .next()                     // should try nodes in order
                      .switchIfEmpty(connections.next())
                      .toFuture();
}
BUT the Flux is built like this:

Flux<StatefulRedisConnection<K, V>> connections = Flux.empty();
for (RedisNodeDescription node : selection) {
    connections = connections.concatWith(Mono.fromFuture(getConnection(node)));
}

Flux.concatWith short-circuits on the first error.
When getConnection(slave1) fails → the first Mono errors → the entire Flux terminates → slave2 is never attempted → .next() never sees a successful connection → exception is thrown directly.
Conclusion: The intended “try the next node on failure” behavior is structurally unreachable due to concatWith semantics.

xiongrl avatar Nov 27 '25 02:11 xiongrl