valkey
valkey copied to clipboard
Reduce the number of calls to dicthash during the execution of the command
During the process of executing the command, we will call the dictFind function at least twice to find the current key, once for lookupkey and once for dbset. If the key is relatively long, calling dicthash one more time will affect the latency.
If we use a temporary variable to store the hash value calculated by dicthash, It might improve the performance a little.
@kukey Thanks for the issue.
I think it would be best to write out the code and benchmark few scenarios and share the numbers. We can look further if it's worth the complexity.
@hpatro hi, I have implemented a simple draft to validate this idea.
use my notebook to test, there are some benchmark result like that: native valkey-server command: src/valkey-benchmark -q -n 1000000 set aaaaaaaaaa aa set aaaaaaaaaa aa: 172711.58 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 173973.55 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 174886.33 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 149432.16 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 164257.56 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 173641.27 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 173340.27 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 176211.45 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 181422.34 requests per second, p50=0.135 msec set aaaaaaaaaa aa: 178635.22 requests per second, p50=0.135 msec set aaaaaaaaaa aa: 157977.89 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 155521.00 requests per second, p50=0.143 msec
optimization valkey-server command: src/valkey-benchmark -p 6378 -q -n 1000000 set aaaaaaaaaa aa set aaaaaaaaaa aa: 180603.20 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 173822.36 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 177619.89 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 176242.52 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 139997.20 requests per second, p50=0.175 msec set aaaaaaaaaa aa: 166057.80 requests per second, p50=0.135 msec set aaaaaaaaaa aa: 176273.58 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 177714.59 requests per second, p50=0.135 msec set aaaaaaaaaa aa: 181851.25 requests per second, p50=0.135 msec set aaaaaaaaaa aa: 177967.61 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 175008.75 requests per second, p50=0.143 msec set aaaaaaaaaa aa: 162284.97 requests per second, p50=0.135 msec set aaaaaaaaaa aa: 163025.77 requests per second, p50=0.135 msec
After excluding some interference results, we can see that the optimized version performs better, with lower p50 latency and approximately 1% improvement in QPS. But there are some problem to consider:
- commands like mget/del..., no need to use hash cache
- like rename command, and other store command, no need to use hash cache let us continue this? I think it`s worth to research and implentment. Thanks!
cc @valkey-io/core-team
I'm strictly not sure about introducing this behavior for 1% gain and how it behaves with some of the behavior you have already called out and things like expiration/rehashing. This might bring in difficult to track bug(s) and lead to correctness issue. Would let the maintainers take the final call.
@kukey I think it's worth evaluating in more detail, can you open a PR with your proposal and we can move the conversation more. I have many of the same concerns that Hari has, but I think saving 1% is material and worth considering investigating.
@kukey Just realized in the above results you used a fixed key/value pair for the benchmarking. Is it intentional? I don't think it's a practical use case. Could you rather benchmark with -t set and observe the behavior.
src/valkey-benchmark \
-t set \
-n 1000000 \
-r 1000000