zstd icon indicating copy to clipboard operation
zstd copied to clipboard

Optimize ZSTD_decodeSequence when ofBits==0

Open yoniko opened this issue 2 years ago • 2 comments

This patch adds a branch to a previously branchless code in decompress hot loop handling the case where ofBits == 0. Even though a branch is added, the branch saves on instructions that introduce memory dependency an unneeded memory operations when the condition isn't met.

Testing on intel Skylake shows positive decompression speed improvements across different corpora and compilers, with speed improvements of 1% to 7%. On M1 Macbook Pro performance is mostly neutral with a possible very small regression.

Full benchmark results - https://docs.google.com/spreadsheets/d/1hEUY5Gkf6Ebz6Gq5X9U5mURC_SsI43BhIFpE7uBDVsw/edit?usp=sharing

yoniko avatar May 22 '23 16:05 yoniko

Seems reasonable for most data, since we probably almost never use ll0 repcodes. I wonder what the perf looks like when we do. E.g. maybe kennedy.xls has this pattern.

terrelln avatar May 23 '23 22:05 terrelln

I will run benchmarks on my server as well

terrelln avatar May 23 '23 22:05 terrelln