slowness on x2gd.8xlarge
Again, arm64. Specifically x2gd.8xlarge. Reproduced on ubuntu 22.04 and 20.04.
When running dragonfly with 32 threads, its 99 and 99.9 latency percentiles are very high (10-20ms).
To reproduce : memtier_benchmark --ratio 0:1 -n 10000 --hide-histogram
With --command=ping benchmark the latency is adequate so I dug in message-passing code. Indeed I saw round-trip latencies of over 10ms (Say if you measure WaitForShardCallbacks() latency).
Reducing --proactor_threads to 28, 26 helps a lot! 99.9 becomes less than 1ms.
I still do not understand why the latency is so high and whether it's a real bug in dragonfly, problems in kernel or a weird interaction on this specific hardware.
For now we have a workaround. I will need to follow up and see where latency variance becomes so scrazy.
Maybe this tweet explains it?
I run it on 64 cores on other machines without any problem. There is a chance that there is a bottleneck but I do not understand this tweet. It's not that we actually move data across threads. At the end - it's a write in one thread and a read in another. Lockless data-structures do the same.
does not reproduce for now. closing