node-redis scanIterator on RedisSentinel

Motivation

Currently, RedisSentinel does not implement scan iterators (tested on @redis/client 5.5.6).

The iterator is not available from a master lease either.

Even if the code is easy to replicate, this could be a nice addition.

Thank you!

Basic Code Example

Jun 17 '25 16:06 benoitvidis

Hi @benoitvidis, thanks for flagging this, we will consider adding it. Would you be willing to contribute maybe?

Jun 18 '25 07:06 nkaradzhov

Hello, I'm very interested in implementing scanIterator support for the RedisSentinel client and would like to contribute to this feature. It's a key requirement for us as we're migrating our application from a single Redis master to a high-availability Sentinel setup, and we rely on this functionality extensively. My initial investigation into the codebase suggests that using acquire() could be a good way to ensure the iteration happens against a stable master node. My proposed solution would look something like this conceptually:

async* scanIterator(options) {
  // Acquire a client lease to ensure all SCAN commands go to the same master
  const masterClient = await this.acquire();
  try {
    let cursor = '0';
    do {
      const reply = await masterClient.scan(cursor, options);
      cursor = reply.cursor;
      yield reply.keys;
    } while (cursor !== '0');
  } finally {
    // Ensure the lease is always released
    masterClient.release();
  }
}

As I continue to explore the source code to fully understand the implications of this approach, I'm focusing on a few key areas. I will continue to dig into these, but I was hoping you might be able to provide some guidance if you have answers to below questions: Failover Behaviour: How a client lease from acquire() behaves during a master failover. For example, if the master goes down mid-iteration, will the leased client throw an error, (assuming it will) or is there a mechanism that would allow it to gracefully reconnect to the newly promoted master? Replica Reads: A huge benefit would be to offload these read-only SCAN operations to replicas. For this to work with an iterator, a lease would need to be "sticky" to a single replica for the duration of the scan. I'm looking into how replica clients are managed in the connection pool to see if acquiring a stable lease on a specific replica is currently possible.

Thanks!

Aug 29 '25 19:08 harshrai654

@harshrai654 ,

Failover behavior of acquire(): the lease survives a master failover. This sounds good, but a scan cursor is node-specific, meaning a cursor created on one node doesnt work on another node. • Your SCAN is effectively broken by the failover. You cannot resume it safely on the new master. • If you need a complete keyspace traversal across failovers, you must restart SCAN from cursor 0 on the new master.
Replica lease: Not available today. You can fabricate one by creating a standalone RedisClient to a specific replica host:port (from sentinel.getReplicaNodes()), but you then own reconnect/retarget behavior.

Details and mitigations

Master lease (via sentinel.acquire())
- Behavior:
  - All SCAN calls stick to one master. On failover, the internal rediscovery retries the command on the promoted master (bounded by maxCommandRediscovers).
  - SCAN is not cursor-portable across nodes; after failover you may see duplicates or miss some keys.
- Mitigate duplicates/misses:
  - Subscribe to topology-change and, if a MASTER_CHANGE occurs mid-iteration, restart from cursor 0 and deduplicate already-seen keys (Set or Bloom filter), or accept eventual consistency and tolerate dupes.
  - Keep your processing idempotent.
Sticky replica (self-created client)
- How:
  - Pick a replica from sentinel.getReplicaNodes() and RedisClient.create() a direct client to that host:port. Run your iterator there.
- Tradeoffs:
  - No automatic failover or retargeting; if that replica dies/promotes, your cursor is effectively invalid.
  - Replica lag/resync can cause duplicates/misses even without failover.
  - All scan load is pinned to one replica.
- Mitigations:
  - On any disconnect or relevant topology-change (REPLICA_REMOVE or MASTER_CHANGE affecting your node), abort, pick a new replica, restart from cursor 0, and dedupe.
  - Use bounded backoff; after N failed reconnects, switch to another replica.
  - Keep work idempotent.

Sep 04 '25 07:09 nkaradzhov

Thank you @nkaradzhov for the incredibly detailed and helpful response. Based on your feedback, I agree that running the iterator against the master node is the best path forward. My reasoning is:

Built-in Failover: The acquire() lease already provides robust failover for the master node, which is a significant advantage.
Replica Consistency: Running SCAN on a replica is problematic due to replication lag, which can cause inconsistent results. The non-portable cursor is a limitation for any failover. By initially supporting a master-only scanIterator, we can leverage the client's existing strengths. We can explore supporting replica scans in the future, perhaps via a config option for users who can tolerate the trade-offs.

With this, my proposed implementation plan is:

Acquire a client lease on the master for the iterator's duration.
Internally, listen for the topology-change event. Upon a MASTER_CHANGE, the iterator will automatically restart the scan from cursor 0 on the new master.
Documenting restart behavior. We will state that in the event of a failover, the iterator may yield duplicate keys, and it is the user's responsibility to make their processing idempotent or handle deduplication.

Does this plan sound right?

Sep 06 '25 13:09 harshrai654

Has any progress been made on this addition? We would really appreciate this feature. Thanks!

Nov 17 '25 08:11 aarond-sp