valkey icon indicating copy to clipboard operation
valkey copied to clipboard

Reduce the number of calls to dicthash during the execution of the command

Open kukey opened this issue 1 year ago • 5 comments

During the process of executing the command, we will call the dictFind function at least twice to find the current key, once for lookupkey and once for dbset. If the key is relatively long, calling dicthash one more time will affect the latency.

If we use a temporary variable to store the hash value calculated by dicthash, It might improve the performance a little.

kukey avatar Jul 02 '24 03:07 kukey

@kukey Thanks for the issue.

I think it would be best to write out the code and benchmark few scenarios and share the numbers. We can look further if it's worth the complexity.

hpatro avatar Jul 02 '24 06:07 hpatro

@hpatro hi, I have implemented a simple draft to validate this idea.

this is a draft implement

use my notebook to test, there are some benchmark result like that: native valkey-server command: src/valkey-benchmark -q -n 1000000 set aaaaaaaaaa aa set aaaaaaaaaa aa: 172711.58 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 173973.55 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 174886.33 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 149432.16 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 164257.56 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 173641.27 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 173340.27 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 176211.45 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 181422.34 requests per second, p50=0.135 msec set aaaaaaaaaa aa: 178635.22 requests per second, p50=0.135 msec set aaaaaaaaaa aa: 157977.89 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 155521.00 requests per second, p50=0.143 msec

optimization valkey-server command: src/valkey-benchmark -p 6378 -q -n 1000000 set aaaaaaaaaa aa set aaaaaaaaaa aa: 180603.20 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 173822.36 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 177619.89 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 176242.52 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 139997.20 requests per second, p50=0.175 msec set aaaaaaaaaa aa: 166057.80 requests per second, p50=0.135 msec set aaaaaaaaaa aa: 176273.58 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 177714.59 requests per second, p50=0.135 msec set aaaaaaaaaa aa: 181851.25 requests per second, p50=0.135 msec set aaaaaaaaaa aa: 177967.61 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 175008.75 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 162284.97 requests per second, p50=0.135 msec set aaaaaaaaaa aa: 163025.77 requests per second, p50=0.135 msec

After excluding some interference results, we can see that the optimized version performs better, with lower p50 latency and approximately 1% improvement in QPS.  But there are some problem to consider:

  • commands like mget/del..., no need to use hash cache
  • like rename command, and other store command, no need to use hash cache  let us continue this? I think it`s worth to research and implentment.  Thanks!

cc @valkey-io/core-team

kukey avatar Jul 05 '24 11:07 kukey

I'm strictly not sure about introducing this behavior for 1% gain and how it behaves with some of the behavior you have already called out and things like expiration/rehashing. This might bring in difficult to track bug(s) and lead to correctness issue. Would let the maintainers take the final call.

hpatro avatar Jul 24 '24 18:07 hpatro

@kukey I think it's worth evaluating in more detail, can you open a PR with your proposal and we can move the conversation more. I have many of the same concerns that Hari has, but I think saving 1% is material and worth considering investigating.

madolson avatar Jul 24 '24 22:07 madolson

@kukey Just realized in the above results you used a fixed key/value pair for the benchmarking. Is it intentional? I don't think it's a practical use case. Could you rather benchmark with -t set and observe the behavior.

src/valkey-benchmark \
 -t set \
 -n 1000000 \
 -r 1000000

hpatro avatar Jul 26 '24 17:07 hpatro