Hans Kristian Rosbach

Results 204 comments of Hans Kristian Rosbach

I am not sure about this one.. Lets consider Linux distros for a second. Most distros prefer some protection settings or other, and therefore we let them specify what they...

This part of the CMake output seems suspicious. If the compiler doesn't need any flags to build with for example AVX2, does that mean that all flags are always enabled...

Undocumented parameters are the best 😛. Seems like perhaps nvc could be handled like gcc but with `-tp px` prefixed to the common cflags then.

In my tests (so far), this is ~3.8% faster on Aarch64 (GCC), and results in 64bytes smaller codesize. On x86_64 (Clang) however, it increases by 224bytes and is 5.7% slower....

RPI 5 - Aarch64 ### Develop Dec 7 GCC ``` Level Comp Comptime min/avg/max/stddev Decomptime min/avg/max/stddev Compressed size 1 54.185% 0.0989/0.1083/0.1106/0.0030 0.0299/0.0408/0.0444/0.0038 8,526,745 2 43.871% 0.1786/0.1898/0.1945/0.0034 0.0317/0.0389/0.0439/0.0033 6,903,702 3 42.388%...

AMD 8700GE - x86_64 ### Develop GCC ``` Level Comp Comptime min/avg/max/stddev Decomptime min/avg/max/stddev Compressed size 1 54.185% 0.0577/0.0579/0.0580/0.0000 0.0241/0.0243/0.0243/0.0001 8,526,745 2 43.871% 0.1017/0.1019/0.1021/0.0001 0.0239/0.0240/0.0240/0.0000 6,903,702 3 42.388% 0.1231/0.1234/0.1235/0.0001 0.0230/0.0231/0.0232/0.0000...

> Rebased on top of latest `develop`. Benchmarks may yield better results now. I don't think you pushed the rebase, or you didn't pull develop before rebasing. In any case...

This is a little weird. RPI5: 6.78% faster i7-11700K: 0.22% faster AMD 8700GE: 4.13% slower Is this PR doing something that the AMD Zen4 core does not like?

Full perf data for develop and pr are here: http://mirror.circlestorm.org/perf-pr2037.tar.gz

Extracts from perf annotate: PR: https://pastebin.com/H1yXrW2g Develop from approximately the same place: https://pastebin.com/Gy5jjES2