miniz_oxide icon indicating copy to clipboard operation
miniz_oxide copied to clipboard

Investigate whether we can avoid a 2k lookup table in init_tree without reducing performance

Open oyvindln opened this issue 1 year ago • 2 comments

As noted in this PR, this lookup table used to increase performance is very large - maybe there is a better way of doing this?

https://github.com/Frommi/miniz_oxide/pull/152

oyvindln avatar May 17 '24 18:05 oyvindln

If anyone is interested in working on this, fdeflate uses a different strategy based on libdeflate. It keeps the codeword in bit-reversed form, and uses more complicated arithmetic to increment it to the next value.

fintelia avatar Dec 08 '24 22:12 fintelia

Yeah it might be worth looking into. Already tinkering a bit with some other tweaks to the huffman tree stuff in inflate.

Should note it's now a 2K table instead of 4k as I found that using 512-size one and bit reverse on larger numbers seemed to not hurt performance.

Also I haven't pushed it yet but also found that it could be avoided entirely on aarch64 and loongarch (and technically armv7 and newer 32-bit arm but didn't find an easy way to differentiate between different arm feature levels at compile time with just the standard lib) with no issue as those architectures have a bit reverse instruction and at least at least when testing on my aarch64 raspberry pi 3B it didn't seem to be any slower to use that.

oyvindln avatar Dec 09 '24 11:12 oyvindln

Still have to look into the libdeflate/fdeflate approach (maybe it's faster than the current one as well) though with 8e331bb it's now just 1k instead of 2k in size as it (and some variables) was using a larger data type than needed (this also seems to have made things a very tiny bit faster). Also disabled the lookup table for builds not using alloc as I'm assuming those setups probably need the extra 1k memory saving more than the tiny speed saving (and there may not be a speed saving anyhow if memory/cache is slow).

oyvindln avatar Mar 05 '25 23:03 oyvindln