Dragonfly latency spikes during full sync replication
Describe the bug When setting up a DragonflyDB replica to sync from a primary instance with 32 million keys, we observe significant latency spikes (p99 response time jumps from 4ms to 50ms) on the primary during the full sync process. This issue persists even under moderate load conditions (around 132K ops/sec). The latency spikes are detrimental to our application's performance and user experience.
To Reproduce Steps to reproduce the behavior:
- Start a DragonflyDB primary instance with below command:
docker run --rm --name dfly0 -d -p 16379:6379 --ulimit memlock=-1 --cpus=64 docker.dragonflydb.io/dragonflydb/dragonfly:v1.35.1 --cache_mode=true --maxmemory=64g --point_in_time_snapshot=false --background_snapshotting=false --dbfilename= - Populate the primary with 32 million keys using a data population script.
- Start a DragonflyDB replica instance with below command:
docker run --rm --name dfly1 -d -p 26379:6379 --ulimit memlock=-1 --cpus=64 docker.dragonflydb.io/dragonflydb/dragonfly:v1.35.1 --cache_mode=true --maxmemory=64g --point_in_time_snapshot=false --background_snapshotting=false --dbfilename= - Use
memtier_benchmarkto generate a load of around 244K ops/sec on the primary as below:docker run --rm -d -v /$(pwd):/home redislabs/memtier_benchmark -s <primary_ip> -p 16379 --data-size=256 --command="MGET __key__ __key__ __key__ __key__ __key__" --command-ratio=10 --command="SET __key__ __data__" --command-ratio=1 --test-time=60 --rate-limiting=660 --json-out-file=/home/memtier_out.json - Configure the replica to sync from the primary:
redis-cli -h <replica_ip> -p 26379 replicaof <primary_ip> 16379 - Monitor the p99 latency on the primary using
memtier_benchmarkoutput.
Expected behavior The p99 latency on the primary should remain stable or show minimal increase during the replica's full sync process, ideally staying below 10ms.
Screenshots
Environment (please complete the following information):
- OS: [ubuntu 22.04]
- Kernel: Linux as-fscache-3043 5.15.0-157-generic #167-Ubuntu SMP Wed Sep 17 21:35:53 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
- Containerized?: Yes, Docker 28.5.1
- Dragonfly Version: v1.35.1
Reproducible Code Snippet Script to populate data (populate_data.py):
import redis
import os
import base64
# ---------- Config ----------
REDIS_HOST = '10.143.160.122'
REDIS_PORT = 16379
DB = 0
TOTAL_KEYS = 32_000_000
BATCH_SIZE = 10_000 # increase batch size
KEY_PREFIX = b"rand:" # use bytes directly
# ---------- Helpers ----------
# Faster random generator using os.urandom + base64
def random_bytes(n: int) -> bytes:
# base64 expands by ~4/3; trim to exactly n
return base64.urlsafe_b64encode(os.urandom(int(n * 0.8)))[:n]
def random_key_bytes(length: int = 16) -> bytes:
return KEY_PREFIX + random_bytes(length)
def random_value_bytes(length: int = 1024) -> bytes:
return random_bytes(length)
# ---------- Main ----------
def main():
# decode_responses=False to avoid encoding/decoding overhead
r = redis.Redis(
host=REDIS_HOST,
port=REDIS_PORT,
db=DB,
decode_responses=False,
)
# transaction=False to avoid MULTI/EXEC
pipe = r.pipeline(transaction=False)
for i in range(1, TOTAL_KEYS + 1):
k = random_key_bytes()
v = random_value_bytes()
pipe.set(k, v)
if i % BATCH_SIZE == 0:
pipe.execute()
# print(f"Inserted {i} keys...")
# flush remaining
pipe.execute()
print("Done.")
if __name__ == "__main__":
main()
Additional context I found another issue report that seems related: #4787, which discusses DragonflyDB is unresponsive during full sync replication. Looks like that issue is fixed already, but the latency spike problem still persists in our case.
can you please try running the master with --compression_mode=0 and see if it helps?
I tried, but looks like doesn't help.
how many CPUS there are on master and replica. Can you provide the htop output of the master during that time?
Maybe also worth trying serialization_max_chunk_size=1. The bench does not use big values per say, but it might make the snapshot fiber yield more as it will now flush on every serialized entry
how many CPUS there are on master and replica. Can you provide the htop output of the master during that time?
I allocate 64 CPU cores for each, during full sync, the CPU usage of master can reach to 3500% - 4000%.
Maybe also worth trying
serialization_max_chunk_size=1. The bench does not use big values per say, but it might make the snapshot fiber yield more as it will now flush on every serialized entry
Looks like this doesn't work, although the CPU usage during full sync is low (~300%), but memtier_benchmark result is even worse.
Also tried with different configs:
- Separate admin port: still has issue
--admin_port=16380 - Adjust threads: still has issue
--proactor_threads=48 --conn_io_threads=16 --conn_use_incoming_cpu=false
@jojoxhsieh how do you monitor QPS and latency from the memtier output?
@jojoxhsieh how do you monitor QPS and latency from the memtier output?
parse the result from --json-out-file=/home/memtier_out.json into csv file and then visualize it as chart.
Did you run replica on a differrent VM from master? Did you run memtier on a differrent VM as well? Overall you should use 3 nodes
Did you run replica on a differrent VM from master?
On the same VM, just different container
Did you run memtier on a differrent VM as well? Overall you should use 3 nodes
No, all on the same VM, but I can try to use separate VM to verify again.
if you run on the same VM your CPU threads are contended by master/replica/memtier. Dragonfly does not like to share its CPU time. you can run master/replica processes on the same VM by using taskset with disjoint CPU ranges. but memtier must run on a differrent machine. For more details about benchmarking methodology please see: https://www.dragonflydb.io/docs/getting-started/benchmark
Having said that I confirm x3 latency increase during the full sync. Needed https://github.com/dragonflydb/dragonfly/pull/6153 to measure this server side.
Another note - the populate step can be replaced with debug populate 32000000 key 1024 RAND - simpler and faster
I run on 3 VMs with 16 cpus each. These are the graphs that memtier recorded. You can see some latency increase as well as latency spikes for SETs but nothing as drastic as you attached. latency_chart.html
I use my original reproduction steps, use --cpuset-cpus to separate cpu for master/replica, and run memtier_benchmark in another machine, but result is not good.
Hi, @romange, can you share the memtier_benchmark command you use to test?
I use debug populate 32000000 key 1024 RAND to populate data instead, but still got same result.
Can you share the commands and steps on how you verify?
BTW, I also tried to populate data for different size (from 64 to 1024), the result is the same.
populate 32000000 key 64 RAND
populate 32000000 key 128 RAND
populate 32000000 key 256 RAND
populate 32000000 key 512 RAND
populate 32000000 key 1024 RAND
Hi, any update?