llvm-project SimplifyDemandedBits should handle fneg/fabs/fcopysign

fneg, fabs and fcopysign are defined as bit operations, and should be subject to bitwise simplification. Fneg is xor with a signbit mask, fabs is and with a sign mask, and fcopysign is and signbit mask, and inverted signbit mask, and or.

This would allow deleting more specific optimizations in visitFCOPYSIGN

This was brought up in https://github.com/llvm/llvm-project/pull/97180#discussion_r1660844076_

Attaching starter patch to point where this needs to go: 0001-TODO-Support-FABS-FCOPYSIGN-FNEG-in-SimplifyDemanded.patch

Jul 02 '24 15:07 arsenm

Hi!

This issue may be a good introductory issue for people new to working on LLVM. If you would like to work on this issue, your first steps are:

Check that no other contributor has already been assigned to this issue. If you believe that no one is actually working on it despite an assignment, ping the person. After one week without a response, the assignee may be changed.
In the comments of this issue, request for it to be assigned to you, or just create a pull request after following the steps below. Mention this issue in the description of the pull request.
Fix the issue locally.
Run the test suite locally. Remember that the subdirectories under test/ create fine-grained testing targets, so you can e.g. use make check-clang-ast to only run Clang's AST tests.
Create a Git commit.
Run git clang-format HEAD~1 to format your changes.
Open a pull request to the upstream repository on GitHub. Detailed instructions can be found in GitHub's documentation. Mention this issue in the description of the pull request.

If you have any further questions about this issue, don't hesitate to ask via a comment in the thread below.

Jul 02 '24 15:07 llvmbot

@llvm/issue-subscribers-good-first-issue

Author: Matt Arsenault (arsenm)

fneg, fabs and fcopysign are defined as bit operations, and should be subject to bitwise simplification. Fneg is xor with a signbit mask, fabs is and with a sign mask, and fcopysign is and signbit mask, and inverted signbit mask, and or.

This would allow deleting more specific optimizations in visitFCOPYSIGN

This was brought up in https://github.com/llvm/llvm-project/pull/97180#discussion_r1660844076_

Attaching starter patch to point where this needs to go: 0001-TODO-Support-FABS-FCOPYSIGN-FNEG-in-SimplifyDemanded.patch

Jul 02 '24 15:07 llvmbot

Hi! I would like to give this a try.

Jul 02 '24 16:07 hmalgewatta

hello @hmalgewatta . Are you still working on this?

Jul 04 '24 19:07 Mehdibenhadjkhelifa

@Mehdibenhadjkhelifa yes

Jul 04 '24 19:07 hmalgewatta

Hi @hmalgewatta, are you still working on it? Maybe I can take a look at this one.

Sep 10 '24 07:09 txff99

Hey @txff99 Are you still working on this? I'd love to help.

Dec 22 '24 16:12 gurvir-dhillon1

I'm looking at the starter patch and I'm still not quite sure where this code needs to go. I believe it should go in the switch statement, specifically after line 3605 where it states in a comment:

  // TODO: There are more binop opcodes that could be handled here - MIN,
  // MAX, saturated math, etc.

since there are a lot of floating point operations handled there. Can someone let me know if this is correct?

Dec 23 '24 23:12 gurvir-dhillon1

I'm looking at the starter patch and I'm still not quite sure where this code needs to go. I believe it should go in the switch statement, specifically after line 3605 where it states in a comment:

The starter patch points you where it would go, in a different switch than the one you mentioned. The correct one is in SimplifyDemandedBits. Fneg for example should look similar to https://github.com/llvm/llvm-project/blob/a72bfc5a1e5381012213df36389524f74ef7c8a3/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp#L1590, except most of the extra optimizations applied here don't really apply. At a minimum this needs the two SimplifyDemandedBits calls on each operand.

// TODO: There are more binop opcodes that could be handled here - MIN, // MAX, saturated math, etc. since there are a lot of floating point operations handled there. Can someone let me know if this is correct?

This is in SimplifyDemandedVectorElts, which is not what this ticket is asking for. You could handle these here as well, but it would be a separate issue.

Dec 26 '24 06:12 arsenm

I apologize for the slow progress as this is my first time working on LLVM, and I'm still getting up to speed.

I've been working on the code for handling the FNEG operation, but I'm unsure where to write my unit tests for it. While exploring the llvm/unittests/CodeGen directory, I found SelectionDAGAddressAnalysisTest.cpp and SelectionDAGPatternMatchTest.cpp, but I'm not sure whether I should create a new test file for SimplifyDemandedBits tests. The LLVM Testing Guide suggests that regression tests are typically preferred, so I'm also uncertain where (and how necessary) it would be to add regression tests for this change.

For reference, here's the code I added (though it is untested, so I’m not sure if it's worth reviewing in detail at this point):

  case ISD::FNEG: {
    // Only one operand to FNEG.
    SDValue Op0 = Op.getOperand(0);
    if (SimplifyDemandedBits(Op0, DemandedBits, DemandedElts, Known, TLO,
                             Depth + 1))
      return true;

    // If the operation can be done in a smaller type, do so.
    if (ShrinkDemandedOp(Op, BitWidth, DemandedBits, TLO))
      return true;

    // Include more potential optimizations for FNEG.

    // Flip the sign bit.
    Known = KnownBits::flipSignBit(Known);
    break;
  }

I initially considered using a sign-bit mask to flip the sign bit manually but noticed that the KnownBits::flipSignBit() function already provides this functionality, so I opted to use it instead. However, since I haven’t tested the code yet, I’m unsure if this approach is appropriate or aligns with best practices. Any advice or direction on where to place the tests would be greatly appreciated.

Thanks for your help!

Dec 30 '24 11:12 gurvir-dhillon1

You need to come up with some IR that has an operand that can be simplified. https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/AMDGPU/copysign-simplify-demanded-bits.ll is one example.

In this case it's more difficult because the obvious optimizations are already separately implemented, which will hide the effect. Part of the point of doing this is to delete those special cases.

For the copysign case, I managed to come up with this contrived example. The or of the low bit of the copysign sign input has no impact on the result, but is still present in the final isa

Jan 02 '25 13:01 arsenm

I believe this issue is beyond my current ability and that I need to spend more time reading documentation and familiarizing myself more with LLVM before beginning to contribute. I was able to optimize the example you provided (the OR instruction no longer appeared in the ASM output), but I don't truly understand what a lot of the moving parts do in the code base and that makes me uncomfortable putting code forward that could potentially be merged into this huge project. I really appreciate the guidance and help, and I apologize for the time and effort I wasted. Please feel free to remove me as an assignee. I'll try to come back stronger!

Jan 12 '25 15:01 gurvir-dhillon1

I believe this issue is beyond my current ability and that I need to spend more time reading documentation and familiarizing myself more with LLVM before beginning to contribute.

Making changes will be far more effective than trying to read documentation to familiarize yourself

but I don't truly understand what a lot of the moving parts do in the code base and that makes me uncomfortable putting code forward that could potentially be merged into this huge project.

You don't really need to understand all of the moving parts to make progress here. Here you have very similar handling for other opcodes already, and can follow along by example without needing to understand everything else

Jan 13 '25 01:01 arsenm

I understand what you're saying. My main issue was that I worked on implementing FNEG first as it seemed the most straightforward (one operand, just xor the sign bit), and as you pointed out, it was difficult to test whether it was working as intended. I tried your example out of curiosity and it surprisingly optimized the unnecessary OR with LSB even though I had not even implemented a FCOPYSIGN case yet (I believe it's because the compiler recognizes that the OR instruction doesn't contain the demanded bits, but I'm really not sure). Here's my FNEG implementation that was optimizing the OR out of your example you linked:

  case ISD::FNEG: {
    // Only one operand to FNEG.
    SDValue Op0 = Op.getOperand(0);
    APInt SignBit = APInt::getSignMask(BitWidth);

    // If sign bit isn't demanded, remove FNEG operation
    if (!DemandedBits.intersects(SignBit))
        return TLO.CombineTo(Op, Op0);

    if (SimplifyDemandedBits(Op0, DemandedBits, DemandedElts, Known, TLO, Depth + 1))
      return true;

    // Include more potential optimizations for FNEG (if possible).

    // Check if sign bit is known, and flip the bit if so
    if (Known.Zero[BitWidth - 1] || Known.One[BitWidth - 1]) {
        Known.Zero ^= SignBit;
        Known.One ^= SignBit;
    }
    
    // Not sure what to do if sign bit isn't known

    break;
  }

My current understanding is that TLO.CombineTo replaces the DAG node Op with Op0 effectively removing Op's instruction. Also that KnownBits.Zero and KnownBits.One contain 1 if that bit is known to be a 0 or 1 respectively, or 0 if it's unknown.

Jan 13 '25 02:01 gurvir-dhillon1