[BUG] Latency problem when one of the nodes is stopped in a multi-master replication setup

Open ruimgoncalves opened this issue 1 year ago • 0 comments

Describe the bug

When using KeyDB in a multi-master replication setup, if one of the nodes is stopped, all other running nodes blocks all operations while attempting to reconnect to the stopped one.

To reproduce

Create 3 nodes (keydb-0, keydb-1, keydb-2) , ensure each one is replicating with each other.

docker-compose up

Connect test clients to each of the nodes

docker run --network keydb_default -it --rm eqalpha/keydb keydb-cli -h keydb-0 -p 6379
docker run --network keydb_default -it --rm eqalpha/keydb keydb-cli -h keydb-1 -p 6379
docker run --network keydb_default -it --rm eqalpha/keydb keydb-cli -h keydb-2 -p 6379

Ensure replication is working, by issuing a simple set x TEST and get x command for each node
Stop one of the nodes (ex: keydb-1)
Verify that issuing command on the active nodes will block until the re-connection attempt to the offline node times out.

Expected behavior

The active nodes should not block operations while attempting to reconnect to offline nodes.

Additional information

docker-compose.yml

version: '3.8'

services:
  keydb-0:
    image: eqalpha/keydb
    hostname: keydb-0
    command: keydb-server --active-replica yes --multi-master yes --replicaof keydb-1 6379 --replicaof keydb-2 6379
  keydb-1:
    image: eqalpha/keydb
    hostname: keydb-1
    command: keydb-server --active-replica yes --multi-master yes --replicaof keydb-0 6379 --replicaof keydb-2 6379
  keydb-2:
    image: eqalpha/keydb
    hostname: keydb-2
    command: keydb-server --active-replica yes --multi-master yes --replicaof keydb-0 6379 --replicaof keydb-1 6379

Jan 04 '25 16:01 ruimgoncalves