llvm15 generates 1.27x longer code for znver3 than icelake-server
As of https://github.com/numba/llvmlite/commit/0cc7741314a4d556b00f79a7c13783f292d30be6
Observation
Numba CI is seeing massive difference (>2x) in memory use due to different CPU allocated by AzureCI. znver3 is using a lot more memory than icelake-server causing jobs to fail because Numba test suite is using up all memory (despite znver3 workers have 1GB extra memory).
https://gist.github.com/sklam/03ba3ae6826f265235f1b5d7cd825d37 contains a reproducer that uses numba to compile np.in1d into LLVM IR. Run it with NUMBA_OPT=0, then use the shell script to compile for the specific cpus. It is observed that the znver3 assembly is 1.27x longer (in lines) than the icelake-server version.
Likely Explanation
https://github.com/llvm/llvm-project/issues/50802 reports a aggressive unrolling issue with znver3 in llvm that is only fixed in llvm19.1.0rc1 by https://github.com/llvm/llvm-project/pull/91340
@swap357 reported that
NUMBA_CPU_NAME="znver3" -> llvm ir: 40,753 lines, 2387kb
NUMBA_CPU_NAME="" -> llvm ir: 6,737 lines, 475kb
NUMBA_CPU_NAME="x86-64" -> llvm ir: 9,200 lines, 611kb
As of https://github.com/numba/llvmlite/commit/dee4d034b4c8c171e08a7224a96836c726d92cf7, I am seeing much smaller code with LLVM20. znver3 output (from the reproducer) is only 1.07x longer than the icelake output.
Here's the data formatted as a proper markdown table:
| LLVM Version | Architecture | Lines | Size |
|---|---|---|---|
| 20 | icelake | 10,762 | 375K bytes |
| 20 | znver3 | 11,612 | 396K bytes |
| 15 | icelake | 11,044 | 386K bytes |
| 15 | znver3 | 13,556 | 458K bytes |