dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

slowness on x2gd.8xlarge

Open romange opened this issue 3 years ago • 2 comments

Again, arm64. Specifically x2gd.8xlarge. Reproduced on ubuntu 22.04 and 20.04.

When running dragonfly with 32 threads, its 99 and 99.9 latency percentiles are very high (10-20ms). To reproduce : memtier_benchmark --ratio 0:1 -n 10000 --hide-histogram

With --command=ping benchmark the latency is adequate so I dug in message-passing code. Indeed I saw round-trip latencies of over 10ms (Say if you measure WaitForShardCallbacks() latency).

Reducing --proactor_threads to 28, 26 helps a lot! 99.9 becomes less than 1ms. I still do not understand why the latency is so high and whether it's a real bug in dragonfly, problems in kernel or a weird interaction on this specific hardware.

For now we have a workaround. I will need to follow up and see where latency variance becomes so scrazy.

romange avatar May 10 '22 09:05 romange

Maybe this tweet explains it?

tomazz75 avatar May 31 '22 19:05 tomazz75

I run it on 64 cores on other machines without any problem. There is a chance that there is a bottleneck but I do not understand this tweet. It's not that we actually move data across threads. At the end - it's a write in one thread and a read in another. Lockless data-structures do the same.

romange avatar May 31 '22 20:05 romange

does not reproduce for now. closing

romange avatar Apr 29 '23 13:04 romange