Optimize hgetall for large hash table
Although hgetall is inherently slow and should be used with caution, but there are cases it's necessary to fetch everything at once. This patch tries to optimize this code path.
@sunxiaoguang It seems like an optimization about the special encoding OBJ_ENCODING_HT, can u give an accurate percentage about the optimization?
@soloestoy Here is the comment I posted on the original pull request to vanilla version. And I copied it here for your convenience as well. A hash gets converted to dict encoding if it's getting, depending on the configuration, more than 512 entries by default. Iterating such a large hash table using the iteration api takes quite some cpu cycles and generated more cache misses then necessary. This merge request uses a dedicated iterate all function to traverse the whole dict to generate hgetall, hkeys and hvals response, which can reduce latency by 1 ~ 5% depends on size of the dict.
The code to generate test data and benchmark the difference can be found here. Please use the 'optimize_hgetall_unstable_comparaison' branch which contains commands for both the original implementation and new implementation for reference.
Some test runs of the benchmark program on a E5-2670 v3 server demonstrate for a 16000 fields hash table (the test h5) the new way can save couple hundreds of microseconds on average for a hgetall call. Running the same test on servers with less cache can observe some more improvements as the iterator way access data in a more scattered way therefore renders cache less efficient.
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.