llvm-project icon indicating copy to clipboard operation
llvm-project copied to clipboard

Compile `urem N` to `SUB`+`CMOVB` instead of `IMUL` when the input is < 2N

Open scottmcm opened this issue 2 years ago • 3 comments

Link for opt trunk: https://llvm.godbolt.org/z/b5bqv6zzq General Alive proof for divisor small enough to avoid addition overflow: https://alive2.llvm.org/ce/z/ntmx2a

Given an input like

  %_4 = urem i32 %x, 13
  %_6 = urem i32 %y, 13
  %_3 = add i32 %_6, %_4
  %_9 = urem i32 %_3, 13

then %_3 is in [0, 26), and thus to avoid needing a full IMUL

        imul    rcx, rax, 1321528399
        shr     rcx, 34
        lea     edx, [rcx + 2*rcx]
        lea     ecx, [rcx + 4*rdx]
        sub     eax, ecx

the last urem can be simplified.

Either to icmp+sub nuw+select (Alive proof: https://alive2.llvm.org/ce/z/BaHRwz)

  %_8 = icmp uge i16 %_3, 13
  %_7 = sub nuw i16 %_3, 13
  %_9 = select i1 %_8, i16 %_7, i16 %_3

which compiles to

        cmp     cx, 13
        lea     eax, [rsi + rdi - 13]
        cmovb   eax, ecx

or — maybe slightly better? — to usub_with_overflow+select (Alive proof: https://alive2.llvm.org/ce/z/RPGqCb)

  %0 = tail call { i32, i1 } @llvm.usub.with.overflow.i32(i32 %_3, i32 13)
  %over = extractvalue { i32, i1 } %0, 1
  %small = extractvalue { i32, i1 } %0, 0
  %_9 = select i1 %over, i32 %_3, i32 %small

which compiles to

        mov     eax, ecx
        sub     eax, 13
        cmovb   eax, ecx

Rust experiment that inspired me to open this: https://rust.godbolt.org/z/fjaba8fdP

scottmcm avatar Oct 09 '22 21:10 scottmcm

@llvm/issue-subscribers-backend-x86

llvmbot avatar Oct 09 '22 21:10 llvmbot

@rotateright InstCombine candidate or should we try to handle this in DAG?

I'd also like to know why that vectorization is happening.....

RKSimon avatar Oct 14 '22 18:10 RKSimon

I'm not sure how to match it (need to calculate sums/diffs of ranges?), but it's an extension of an existing instcombine for urem (and we'd prefer regular "sub" to the usub.ov intrinsic form): https://alive2.llvm.org/ce/z/74yZsv

And yes, that vectorization looks like a bad idea even if we are recovering in codegen.

rotateright avatar Oct 17 '22 12:10 rotateright

Fixed by https://github.com/llvm/llvm-project/commit/66efb986322b206834e7c9e1eb777fa053912c39.

nikic avatar Jan 03 '23 14:01 nikic