kerncraft icon indicating copy to clipboard operation
kerncraft copied to clipboard

IACA analysis incompatible with clang (3.8 and 4.0)

Open rrzeschorscherl opened this issue 8 years ago • 5 comments

I have tried kerncraft (current checkout, 0.5.7) with the himeno.c code and clang:

(python) gh@einstein:~/programming/python/kerncraft/examples$ kerncraft --machine machine-files/BroadwellEP_E5-2697_CoD.yml --pmodel ECM -D M 50 -D N 50 -D L 500 --cache-predictor LC --compiler clang-4.0 kernels/himeno.c
[...]
IACA analysis failed: pointer_increment could not be detected automatically

This happens with clang 3.8 and 4.0.

rrzeschorscherl avatar Nov 27 '17 16:11 rrzeschorscherl

Update: This seems to occur when the index increment is exactly 1, which happens when the compiler does neither vectorize nor otherwise unroll the loop:

        vmovss  %xmm0, (%rsi,%rdx,4)
        incq    %rdx
        cmpl    %edx, %ebx
        jne     .LBB0_127

In this case the index register (here rdx) is not increased by an addq instruction but by a simple incq. It's not exclusive to clang; if I prevent vectorization with the Intel compiler, the same error occurs.

rrzeschorscherl avatar Nov 27 '17 19:11 rrzeschorscherl

Still does not work reliably in 0.6.0:

Executing (compile):  clang-5.0 -Ofast -mavx -D_POSIX_C_SOURCE=200112L -std=c99 himeno.c_compilable.c -S -I/home/gh/programming/python/lib/python3.6/site-packages/kerncraft/headers/
IACA analysis failed: pointer_increment could not be detected automatically. Use --pointer-increment to set manually to byte offset of store pointer address between consecutive assembly block iterations

The loop mechanics looks like this:

        incq    %r10
        cmpq    %r10, %r14
        jne     .LBB0_126

rrzeschorscherl avatar Dec 09 '17 13:12 rrzeschorscherl

Can you paste me the whole assembly block? There is more than just the loop mechanics being used for the detection.

cod3monk avatar Dec 11 '17 07:12 cod3monk

Here we are:

himeno.c_compilable.txt

Had to rename it - github does not allow .s files as attchments :-/ This was generated with clang 4.0 and the -O3 -mavx options. The loop mechanics is different from the one above, though (which was done with clang 5.0).

rrzeschorscherl avatar Dec 11 '17 10:12 rrzeschorscherl

Difficult one. There are two stores in the loop, one of which goes onto the stack. That confuses the increment detector, because its offset does not change from one iteration to the other and therefore the loop increment would be 0. One workaround would be to ignore anything related to the stack pointer register, but who knows if another compiler will decide to make use of it in another way?

cod3monk avatar Jan 09 '18 08:01 cod3monk