Optimize CRC16 using multi-byte LUT
Optimize CRC16 using multi-byte LUT.
Hi @zuiderkwast ,
I would like to share some benchmark results of this PR. If we are going to merge this PR, there are at least two works waiting for finishing:
- jump-table for reduction
- code tidy (e.g., using macro)
Environment
- CPU: Intel(R) Core(TM) i5-3230M CPU @ 2.60GHz
- OS: Linux 5.15.0-157-generic
- For all "reduction", it reduces from the target length to 1-byte (e.g., 8-byte: 8 -> 4 -> 2 -> 1)
Benchmark Results
baseline (commit f54818cc60597e9fe5dc03a52fd39ab944cd4932 on unstable branch
# -t get,set
SET: 292902.97 requests per second, p50=1.431 msec
GET: 350557.41 requests per second, p50=1.199 msec
# -t get
GET: 454277.00 requests per second, p50=0.975 msec
2-byte reduction
# -t set,get
SET: 321502.06 requests per second, p50=1.351 msec
GET: 387236.69 requests per second, p50=1.151 msec
# -t get
GET: 453309.16 requests per second, p50=0.959 msec
4-byte reduction
# -t set,get
SET: 324191.12 requests per second, p50=1.335 msec
GET: 376364.31 requests per second, p50=1.135 msec
# -t get
GET: 470167.88 requests per second, p50=0.943 msec
8-byte reduction
# -t set,get
SET: 319550.06 requests per second, p50=1.375 msec
GET: 384349.31 requests per second, p50=1.143 msec
# -t get
GET: 469329.34 requests per second, p50=0.959 msec
16-byte reduction
# -t set,get
SET: 300210.12 requests per second, p50=1.431 msec
GET: 391880.25 requests per second, p50=1.143 msec
# -t get
GET: 443066.03 requests per second, p50=0.943 msec
Interesting! But with a larger LUT, when it is loaded into L1 cache, other stuff are evicted that might be needed later, so it may depend on use case if it's actually faster in all cases.
How much memory would the 2-byte reduction variant use? The current draft is for the 16-byte reduction, right?
Hi @zuiderkwast ,
How much memory would the 2-byte reduction variant use?
It will use 1 KB memory for LUT (256 entries * 2 bytes of each entry * 2 times for 2-byte lookup at the same time = 1024 bytes = 1 KB).
The current draft is for the 16-byte reduction, right?
Yes, the current draft uses 16-byte reduction.