llvm-project
llvm-project copied to clipboard
Compile `urem N` to `SUB`+`CMOVB` instead of `IMUL` when the input is < 2N
Link for opt trunk: https://llvm.godbolt.org/z/b5bqv6zzq General Alive proof for divisor small enough to avoid addition overflow: https://alive2.llvm.org/ce/z/ntmx2a
Given an input like
%_4 = urem i32 %x, 13
%_6 = urem i32 %y, 13
%_3 = add i32 %_6, %_4
%_9 = urem i32 %_3, 13
then %_3
is in [0, 26), and thus to avoid needing a full IMUL
imul rcx, rax, 1321528399
shr rcx, 34
lea edx, [rcx + 2*rcx]
lea ecx, [rcx + 4*rdx]
sub eax, ecx
the last urem
can be simplified.
Either to icmp
+sub nuw
+select
(Alive proof: https://alive2.llvm.org/ce/z/BaHRwz)
%_8 = icmp uge i16 %_3, 13
%_7 = sub nuw i16 %_3, 13
%_9 = select i1 %_8, i16 %_7, i16 %_3
which compiles to
cmp cx, 13
lea eax, [rsi + rdi - 13]
cmovb eax, ecx
or — maybe slightly better? — to usub_with_overflow
+select
(Alive proof: https://alive2.llvm.org/ce/z/RPGqCb)
%0 = tail call { i32, i1 } @llvm.usub.with.overflow.i32(i32 %_3, i32 13)
%over = extractvalue { i32, i1 } %0, 1
%small = extractvalue { i32, i1 } %0, 0
%_9 = select i1 %over, i32 %_3, i32 %small
which compiles to
mov eax, ecx
sub eax, 13
cmovb eax, ecx
Rust experiment that inspired me to open this: https://rust.godbolt.org/z/fjaba8fdP
@llvm/issue-subscribers-backend-x86
@rotateright InstCombine candidate or should we try to handle this in DAG?
I'd also like to know why that vectorization is happening.....
I'm not sure how to match it (need to calculate sums/diffs of ranges?), but it's an extension of an existing instcombine for urem (and we'd prefer regular "sub" to the usub.ov intrinsic form): https://alive2.llvm.org/ce/z/74yZsv
And yes, that vectorization looks like a bad idea even if we are recovering in codegen.
Fixed by https://github.com/llvm/llvm-project/commit/66efb986322b206834e7c9e1eb777fa053912c39.