performance improvements for pipelining

Open romange opened this issue 2 months ago • 1 comments

from running:

 ./dfly_bench  --command "zadd foo nx __score__ __data__" -d 1 -c 1 --qps 0 -n 50000000 --pipeline=5000

I saw we have:

some bottleneck on absl::GetCurrentTimeNanos()
Bottleneck on ZzlStrtod (i.e. strtod)
Bottleneck in Transaction::StoreKeysInArgs (stub transaction in multi squasher), and in general InitByKeys is quite significant (8-9%) because it probably reads the argument slices in another thread (cold ram). I do not know if we can do anything about it, just noting.

Oct 22 '25 06:10 romange

Same command with valkey:

docker run --network="host" --rm   --cpuset-cpus="2-3"   valkey/valkey:9.0   --save "" --appendonly no    --protected-mode no

reaches almost double throughput.

(this scenario is not common in prod, the workload is deliberately chosen to showcase the threading/networking part of both systems for a single connection)

Oct 22 '25 06:10 romange