Hans Kristian Rosbach
Hans Kristian Rosbach
RPI5 ### Develop Dec9 GCC ``` Level Comp Comptime min/avg/max/stddev Decomptime min/avg/max/stddev Compressed size 1 48.164% 0.2779/0.2901/0.2940/0.0043 0.0982/0.1126/0.1178/0.0053 21,937,619 2 39.504% 0.4891/0.5038/0.5083/0.0044 0.0979/0.1092/0.1151/0.0048 17,993,257 3 37.948% 0.5918/0.6071/0.6117/0.0046 0.0961/0.1052/0.1111/0.0046 17,284,459 4...
AMD 8700GE Zen4 ### Develop Dec 9 ``` Level Comp Comptime min/avg/max/stddev Decomptime min/avg/max/stddev Compressed size 1 48.164% 0.1337/0.1346/0.1349/0.0004 0.0566/0.0568/0.0570/0.0001 21,937,619 2 39.504% 0.2350/0.2357/0.2360/0.0003 0.0557/0.0560/0.0565/0.0002 17,993,257 3 37.948% 0.2869/0.2877/0.2880/0.0003 0.0530/0.0533/0.0537/0.0002...
i7-11700K ### Develop 9 Dec ``` Level Comp Comptime min/avg/max/stddev Decomptime min/avg/max/stddev Compressed size 1 48.164% 0.1963/0.1969/0.1973/0.0002 0.0791/0.0793/0.0794/0.0001 21,937,619 2 39.504% 0.3198/0.3207/0.3217/0.0005 0.0780/0.0787/0.0790/0.0004 17,993,257 3 37.948% 0.3946/0.3955/0.3962/0.0004 0.0740/0.0742/0.0742/0.0000 17,284,459 4...
The benchmarked changes: ```diff diff --git a/inffast_tpl.h b/inffast_tpl.h index f12408f3..6e389395 100644 --- a/inffast_tpl.h +++ b/inffast_tpl.h @@ -101,7 +101,7 @@ void Z_INTERNAL INFLATE_FAST(PREFIX3(stream) *strm, uint32_t start) { with (1> state->bits) ==...
MSVC failures: `inftrees.c(234): warning C4242: 'function': conversion from 'unsigned int' to 'uint16_t', possible loss of data`
I am not able to reliably benchmark this it seems. But how about making bi_reverse take uint16_t code directly? After all it already converts code down to uint16_t, and it...
Tested this on RPI5: ``` ------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------- crc32/armv8/1 2.17 ns 2.17 ns 1287449066 crc32/armv8/8 4.35 ns 4.35 ns 643726859 crc32/armv8/12 4.35 ns 4.35 ns 643742153 crc32/armv8/16...
> [gist.github.com/nmoinvaz/b56489b6643156df798ea8f04d1ceefd](https://gist.github.com/nmoinvaz/b56489b6643156df798ea8f04d1ceefd) @Dead2 what do you think? Is there any other way to detect arm cpu model/manufacturer? /proc/cpuinfo might not be accessible as a non-root user. It also requires a...
What I'm left wondering is whether it is possible to optimize for the single-lane PMULL? Interleaving PMULL and regular scalar code for example. I have not more than glanced at...
> @Dead2 can you try this function on your RP, it passes gtest_lib. [gist.github.com/nmoinvaz/889dafb1f9c182b59192ec3d45729a55](https://gist.github.com/nmoinvaz/889dafb1f9c182b59192ec3d45729a55) Sorry, that is even worse for some reason.. ``` crc32/armv8/1 2.17 ns 2.17 ns 1287459366 crc32/armv8/8...