llvm-project
llvm-project copied to clipboard
[AArch64] Use isKnownNonZero to optimize eligible compares to cmn
Turning a cmp into cmn saves an extra mov and negate instruction, so take that into account when choosing when to flip the compare operands.
Also do not consider right-hand operands whose absolute value can be encoded into a cmn.
@llvm/pr-subscribers-backend-aarch64
Author: AtariDreams (AtariDreams)
Changes
Full diff: https://github.com/llvm/llvm-project/pull/96349.diff
2 Files Affected:
- (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+8-6)
- (modified) llvm/test/CodeGen/AArch64/cmp-chains.ll (+31)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 0c834acacdca9..6b6cfe7e82a1e 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -3372,9 +3372,10 @@ static bool isLegalArithImmed(uint64_t C) {
// So, finally, the only LLVM-native comparisons that don't mention C and V
// are SETEQ and SETNE. They're the only ones we can safely use CMN for in
// the absence of information about op2.
-static bool isCMN(SDValue Op, ISD::CondCode CC) {
+static bool isCMN(SDValue Op, ISD::CondCode CC, SelectionDAG &DAG) {
return Op.getOpcode() == ISD::SUB && isNullConstant(Op.getOperand(0)) &&
- (CC == ISD::SETEQ || CC == ISD::SETNE);
+ (CC == ISD::SETEQ || CC == ISD::SETNE ||
+ DAG.isKnownNeverZero(Op.getOperand(1)));
}
static SDValue emitStrictFPComparison(SDValue LHS, SDValue RHS, const SDLoc &dl,
@@ -3419,11 +3420,11 @@ static SDValue emitComparison(SDValue LHS, SDValue RHS, ISD::CondCode CC,
// register to WZR/XZR if it ends up being unused.
unsigned Opcode = AArch64ISD::SUBS;
- if (isCMN(RHS, CC)) {
+ if (isCMN(RHS, CC, DAG)) {
// Can we combine a (CMP op1, (sub 0, op2) into a CMN instruction ?
Opcode = AArch64ISD::ADDS;
RHS = RHS.getOperand(1);
- } else if (isCMN(LHS, CC)) {
+ } else if (isCMN(LHS, CC, DAG)) {
// As we are looking for EQ/NE compares, the operands can be commuted ; can
// we combine a (CMP (sub 0, op1), op2) into a CMN instruction ?
Opcode = AArch64ISD::ADDS;
@@ -3527,7 +3528,8 @@ static SDValue emitConditionalComparison(SDValue LHS, SDValue RHS,
}
} else if (RHS.getOpcode() == ISD::SUB) {
SDValue SubOp0 = RHS.getOperand(0);
- if (isNullConstant(SubOp0) && (CC == ISD::SETEQ || CC == ISD::SETNE)) {
+ if (isNullConstant(SubOp0) && (CC == ISD::SETEQ || CC == ISD::SETNE ||
+ DAG.isKnownNeverZero(RHS.getOperand(1)))) {
// See emitComparison() on why we can only do this for SETEQ and SETNE.
Opcode = AArch64ISD::CCMN;
RHS = RHS.getOperand(1);
@@ -3848,7 +3850,7 @@ static SDValue getAArch64Cmp(SDValue LHS, SDValue RHS, ISD::CondCode CC,
// can be turned into:
// cmp w12, w11, lsl #1
if (!isa<ConstantSDNode>(RHS) || !isLegalArithImmed(RHS->getAsZExtVal())) {
- SDValue TheLHS = isCMN(LHS, CC) ? LHS.getOperand(1) : LHS;
+ SDValue TheLHS = isCMN(LHS, CC, DAG) ? LHS.getOperand(1) : LHS;
if (getCmpOperandFoldingProfit(TheLHS) > getCmpOperandFoldingProfit(RHS)) {
std::swap(LHS, RHS);
diff --git a/llvm/test/CodeGen/AArch64/cmp-chains.ll b/llvm/test/CodeGen/AArch64/cmp-chains.ll
index 14cb0c82b1c03..0713fdb7283a9 100644
--- a/llvm/test/CodeGen/AArch64/cmp-chains.ll
+++ b/llvm/test/CodeGen/AArch64/cmp-chains.ll
@@ -258,3 +258,34 @@ define i32 @neg_range_int(i32 %a, i32 %b, i32 %c) {
ret i32 %retval.0
}
+; (b > -3 || a < -(c | 1))
+define i32 @neg_range_int_cmn(i32 %a, i32 %b, i32 %c) {
+; SDISEL-LABEL: neg_range_int_cmn:
+; SDISEL: // %bb.0:
+; SDISEL-NEXT: orr w8, w2, #0x1
+; SDISEL-NEXT: cmn w8, w0
+; SDISEL-NEXT: ccmn w1, #3, #0, le
+; SDISEL-NEXT: csel w0, w1, w0, gt
+; SDISEL-NEXT: ret
+;
+; GISEL-LABEL: neg_range_int_cmn:
+; GISEL: // %bb.0:
+; GISEL-NEXT: orr w8, w2, #0x1
+; GISEL-NEXT: cmn w1, #3
+; GISEL-NEXT: neg w8, w8
+; GISEL-NEXT: cset w9, gt
+; GISEL-NEXT: cmp w8, w0
+; GISEL-NEXT: cset w8, gt
+; GISEL-NEXT: orr w8, w9, w8
+; GISEL-NEXT: and w8, w8, #0x1
+; GISEL-NEXT: tst w8, #0x1
+; GISEL-NEXT: csel w0, w1, w0, ne
+; GISEL-NEXT: ret
+ %or = or i32 %c, 1
+ %sub = sub nsw i32 0, %or
+ %cmp = icmp sgt i32 %b, -3
+ %cmp1 = icmp sgt i32 %sub, %a
+ %1 = select i1 %cmp, i1 true, i1 %cmp1
+ %ret = select i1 %1, i32 %b, i32 %a
+ ret i32 %ret
+}
@arsenm What do you think of this?
@dtcxzyw does this have an adverse effect on compile time? I didn't find any when I profiled LLVM.
:white_check_mark: With the latest revision this PR passed the C/C++ code formatter.
@arsenm @davemgreen Thoughts?
@jonathandavies-arm What are your thoughts on this?
@davemgreen This should be ready now!
I think this looks OK now, and the test all look OK. Do you mind if we wait until after the release branch to commit, in case there is something still wrong?
It would be good to have test cases for lots of various cases too, even if they don't trigger - different signed conditions, cmn on the LHS/RHS, ccmn on the LHS/RHS etc. Thanks
If something goes wrong, we can revert it. However, I've tested and tested this and found no issue.
I'm working on an expansion to this but that will sadly have to wait until after the release branch, but this is the part that has been thoroughly tested
If something goes wrong, we can revert it. However, I've tested and tested this and found no issue.
This isn't really the point of adding tests.
OK - I was just being careful as we are so close to the release and some of the issues can be subtle. I gave it a more thorough test and everything seems OK.
By "tested this and found no issue", does that include running a bootstrap and the llvm-test-suite? If so and you add a some extra test coverage then this LGTM.
OK - I was just being careful as we are so close to the release and some of the issues can be subtle. I gave it a more thorough test and everything seems OK.
By "tested this and found no issue", does that include running a bootstrap and the llvm-test-suite? If so and you add a some extra test coverage then this LGTM.
Yes I ran the entire test suite.
OK. If that has succeeded and a bootstrap also does OK, then if you can add some more tests for various ccmn cases we can get this in.
@davemgreen I have reverted this back because otherwise I'm gonna have to do another bootstrap test, and I think that change can be accepted after this one instead of part of this PR.
So this should be good to go!