IACA analysis incompatible with clang (3.8 and 4.0)
I have tried kerncraft (current checkout, 0.5.7) with the himeno.c code and clang:
(python) gh@einstein:~/programming/python/kerncraft/examples$ kerncraft --machine machine-files/BroadwellEP_E5-2697_CoD.yml --pmodel ECM -D M 50 -D N 50 -D L 500 --cache-predictor LC --compiler clang-4.0 kernels/himeno.c
[...]
IACA analysis failed: pointer_increment could not be detected automatically
This happens with clang 3.8 and 4.0.
Update: This seems to occur when the index increment is exactly 1, which happens when the compiler does neither vectorize nor otherwise unroll the loop:
vmovss %xmm0, (%rsi,%rdx,4)
incq %rdx
cmpl %edx, %ebx
jne .LBB0_127
In this case the index register (here rdx) is not increased by an addq instruction but by a simple incq. It's not exclusive to clang; if I prevent vectorization with the Intel compiler, the same error occurs.
Still does not work reliably in 0.6.0:
Executing (compile): clang-5.0 -Ofast -mavx -D_POSIX_C_SOURCE=200112L -std=c99 himeno.c_compilable.c -S -I/home/gh/programming/python/lib/python3.6/site-packages/kerncraft/headers/
IACA analysis failed: pointer_increment could not be detected automatically. Use --pointer-increment to set manually to byte offset of store pointer address between consecutive assembly block iterations
The loop mechanics looks like this:
incq %r10
cmpq %r10, %r14
jne .LBB0_126
Can you paste me the whole assembly block? There is more than just the loop mechanics being used for the detection.
Here we are:
Had to rename it - github does not allow .s files as attchments :-/ This was generated with clang 4.0 and the -O3 -mavx options. The loop mechanics is different from the one above, though (which was done with clang 5.0).
Difficult one. There are two stores in the loop, one of which goes onto the stack. That confuses the increment detector, because its offset does not change from one iteration to the other and therefore the loop increment would be 0. One workaround would be to ignore anything related to the stack pointer register, but who knows if another compiler will decide to make use of it in another way?