llvmlite icon indicating copy to clipboard operation
llvmlite copied to clipboard

llvm15 generates 1.27x longer code for znver3 than icelake-server

Open sklam opened this issue 5 months ago • 2 comments

As of https://github.com/numba/llvmlite/commit/0cc7741314a4d556b00f79a7c13783f292d30be6

Observation

Numba CI is seeing massive difference (>2x) in memory use due to different CPU allocated by AzureCI. znver3 is using a lot more memory than icelake-server causing jobs to fail because Numba test suite is using up all memory (despite znver3 workers have 1GB extra memory).

https://gist.github.com/sklam/03ba3ae6826f265235f1b5d7cd825d37 contains a reproducer that uses numba to compile np.in1d into LLVM IR. Run it with NUMBA_OPT=0, then use the shell script to compile for the specific cpus. It is observed that the znver3 assembly is 1.27x longer (in lines) than the icelake-server version.

Likely Explanation

https://github.com/llvm/llvm-project/issues/50802 reports a aggressive unrolling issue with znver3 in llvm that is only fixed in llvm19.1.0rc1 by https://github.com/llvm/llvm-project/pull/91340

sklam avatar Aug 05 '25 13:08 sklam

@swap357 reported that

NUMBA_CPU_NAME="znver3" -> llvm ir: 40,753 lines, 2387kb NUMBA_CPU_NAME="" -> llvm ir: 6,737 lines, 475kb NUMBA_CPU_NAME="x86-64" -> llvm ir: 9,200 lines, 611kb

sklam avatar Aug 08 '25 13:08 sklam

As of https://github.com/numba/llvmlite/commit/dee4d034b4c8c171e08a7224a96836c726d92cf7, I am seeing much smaller code with LLVM20. znver3 output (from the reproducer) is only 1.07x longer than the icelake output. Here's the data formatted as a proper markdown table:

LLVM Version Architecture Lines Size
20 icelake 10,762 375K bytes
20 znver3 11,612 396K bytes
15 icelake 11,044 386K bytes
15 znver3 13,556 458K bytes

sklam avatar Aug 26 '25 15:08 sklam