clangir icon indicating copy to clipboard operation
clangir copied to clipboard

[CIR][CIRGen] Add complex type and its CIRGen support

Open Lancern opened this issue 11 months ago • 8 comments

This PR adds !cir.complex type to model the _Complex type in C. It also contains support for its CIRGen.

In detail, this patch adds the following CIR types, ops, and attributes:

  • The !cir.complex type is added to model the _Complex type in C. This type is parameterized with the type of the components of the complex number, which must be either an integer type or a floating-point type.
  • The #cir.complex attribute is added to represent a literal value of _Complex type. It is a struct-like attribute that provides the real and imaginary part of the literal _Complex value.
  • The cir.complex.create op is added to create a complex value from its real and imaginary parts.
  • The cir.complex.real and cir.complex.imag op is added to extract the real and imaginary part of a value of !cir.complex type, respectively.

CIRGen support for the new complex type is also added. ~Note the implementation diverges from the original clang CodeGen, where expressions of complex types are handled differently from scalars and aggregates. Instead, this patch treats expressions of complex types as scalars, as such expressions can be simply lowered to a CIR value of !cir.complex type.~

This PR addresses #445 .

Lancern avatar Mar 17 '24 15:03 Lancern

@bcardosolopes Hi Bruno, thanks for your review! Let me explain in more details about why I'm not following the skeleton here.

The original clang CodeGen divide expressions into three categories according to their types: 1) scalar, 2) complex, and 3) aggregate. CodeGen handles these three categories of expressions differently:

  • CodeGen of a scalar expression yields a llvm::Value * that represent the prvalue result of the expression. Note that scalar expressions are always prvalue.
  • CodeGen of a complex expression yields a std::pair<llvm::Value *, llvm::Value *> that represent the prvalue result of the expression. The two llvm::Value * represent the real part and the imaginary part of the result complex value, respectively. Note that complex expressions are also always prvalue.
  • CodeGen of an aggregate expression does not yield anything. Instead, a destination address must be given to the CodeGen and the aggregate produced by the expression will be emitted into that location.

So the CodeGen of scalar expressions and complex expressions are actually very similar. I believe the reason why CodeGen distinguishes them is that CodeGen of scalar expressions yields llvm::Value * while CodeGen of complex expressions yields std::pair<llvm::Value *, llvm::Value *>.

OK, that's enough of background story. In CIR, what makes a difference is that instead of using something like std::pair<mlir::Value, mlir::Value> to represent a complex value, we now have the !cir.complex type which allows us to use a single mlir::Value to represent a complex value. In other words, in CIR, if we keep following the skeleton:

  • CIRGen of a scalar expression yields a mlir::Value that represents the prvalue result.
  • CIRGen of a complex expression also yields a mlir::Value that represents the prvalue result.
  • CIRGen of an aggregate expression keeps what it is.

If we follow the skeleton, the code in CIRGenExprScalar.cpp and CIRGenExprComplex.cpp (which hasn't been invented) would be very very similar to each other, resulting in unnecessary redundancy. Thus I decided to treat expressions of complex types as scalar expressions in CIRGen. This eases the implementation greatly and it still follow the intuition.

I hope this explanation is clear enough for you! If you are still confused please let me know.

Lancern avatar Mar 18 '24 13:03 Lancern

Thanks for the detailed explanation (very clarifying) and for thinking about code duplication!

If we follow the skeleton, the code in CIRGenExprScalar.cpp and CIRGenExprComplex.cpp (which hasn't been invented) would be very very similar to each other, resulting in unnecessary redundancy. Thus I decided to treat expressions of complex types as scalar expressions in CIRGen. This eases the implementation greatly and it still follow the intuition.

clang/lib/CodeGen/CGExprComplex.cpp is quite big. Are you confident that current missing complex codegen will just apply with all scalar stuff? I have the feeling that as soon as something differs, we'll need to change scalar emitters to take complex types into account, and because we're assuming scalar handles it all, we won't have asserts to catch what's not currently implemented - so might end up with bad codegen which will take time to investigate and debug.

I believe duplication will lead us to converge faster and have less issues. How about you add a FIXME(cir): unlike traditional CodeGen, this shares common code with ScalarExprEmitter to every method that is the same between ComplexExprEmitter and ScalarExprEmitter, so that when we complete complex support, we can decide if there's a refactoring approach we could take?

bcardosolopes avatar Mar 19 '24 01:03 bcardosolopes

How about you add a FIXME(cir): unlike traditional CodeGen, this shares common code with ScalarExprEmitter to every method that is the same between ComplexExprEmitter and ScalarExprEmitter, so that when we complete complex support, we can decide if there's a refactoring approach we could take?

OK I believe this is also acceptable and it allows us to evaluate this divergence more deeply before we can make further decisions. I'll move the complex CIRGen code to a new CIRGenComplex.cpp file and keep the upstream code structure for now.

Lancern avatar Mar 23 '24 06:03 Lancern

Rebased onto the latest main, but not yet started to work on the reviews.

Lancern avatar Mar 23 '24 08:03 Lancern

@Lancern sorry, I was out on vacation. Let me know when you address all reviews and update the PR.

bcardosolopes avatar Apr 03 '24 19:04 bcardosolopes

@bcardosolopes Hi Bruno, I have updated the implementation so that it now follows the upstream skeleton.

Lancern avatar Apr 06 '24 03:04 Lancern

Let me know once this is ready for review again!

bcardosolopes avatar Apr 25 '24 22:04 bcardosolopes

@lanza rebased over the weekend, sorry for the churn. Can you rebase this PR again?

bcardosolopes avatar Apr 29 '24 19:04 bcardosolopes

Rebased onto the latest main.

Lancern avatar May 12 '24 12:05 Lancern

@bcardosolopes Hi Bruno, I believe this PR is now ready for another round of review.

I decide to keep this PR simpler (but the size of the PR is still quite big). In the latest version I made the following adjustments:

  • I replaced the #cir.complex attribute with the #cir.imag attribute. The former represents a whole complex number, while the latter represents a complex number that is purely imaginary (i.e. its real part is 0), which can be written in C as a number literal followed by the i suffix (e.g. 3.0i will become #cir.imag<#cir.fp<3.0>>).
  • I replaced the cir.complex.real and cir.complex.imag with cir.complex.get_real and cir.complex.get_imag, respectively. The main difference here is the get_* version yields pointers to the corresponding complex number component just like what cir.get_member is doing. This change is to support the __real__ and __imag__ operator which yield lvalues.
  • I removed a lot of code in CGExprComplex.cpp and left most visitors as unimplemented for now. Only a few fundamental complex number operations are supported in this PR so code review can be easier, including:
    • List initialization of complex numbers that initializes both of the real part and the imaginary part;
    • Imaginary literals;
    • Load and store of complex number values;
    • __real__ and __imag__ operator.

All supported operations are tested in clang/test/CIR/CodeGen/complex.c . Support for more complex number operations can be added in future PRs.

Lancern avatar May 14 '24 14:05 Lancern

After looking the PR, I'm not sure we need #cir.imag, it should just be an int or fp attribute directly.

I prefer keeping the #cir.complex to be used directly with cir.const, it composes well with the rest of CIR consts.

Hi Bruno, thanks for your reviews here. The problem with #cir.complex is that there is no literal expression in C that corresponds to it directly. One may think that the initializer list expression {real, imag} should be lowered to a #cir.complex but this could only happen when real and imag are both constant expressions. Otherwise you have no choice but fallback to cir.complex.create. Of course we could make {real, imag} lower to cir.const #cir.complex by trying to constant-evaluate the two elements during CIRGen.

#cir.imag, on the other hand, corresponds to the literal expression 1i, 2i, etc. In clang these literal expressions are of complex type and this is exactly what I'm trying to model in CIR here.

Lancern avatar Jun 17 '24 15:06 Lancern

Rebased onto the latest main.

Lancern avatar Jun 18 '24 15:06 Lancern

I agree that #cir.imag looks inconsistent and not promising. Removed it from this PR, and now imaginary literals such as 1i is lowered to three ops: two cir.const and one cir.complex.create:

int _Complex c = 3i;
// %0 = cir.const #cir.int<0> : !s32i
// %1 = cir.const #cir.int<3> : !s32i
// %c = cir.complex.create %0, %1 : !s32i -> !cir.complex<!s32i>

Besides, updated #cir.zero so that it can be used for creating zero values of complex type:

int _Complex c;
// cir.global external @c = #cir.zero : !cir.complex<!s32i>

Lancern avatar Jun 18 '24 15:06 Lancern

Ok, I was thinking we could have a fold operation of sorts to canonicalize this

This looks nice. I'll add this in another PR once this PR lands!

Lancern avatar Jun 18 '24 15:06 Lancern

This looks nice. I'll add this in another PR once this PR lands!

Sounds good!

bcardosolopes avatar Jun 20 '24 20:06 bcardosolopes

Rebased onto the latest main.

Lancern avatar Jun 23 '24 05:06 Lancern

Rebased onto the latest main.

Lancern avatar Jun 29 '24 09:06 Lancern

Pending replies to the questions asked.

bcardosolopes avatar Jul 01 '24 19:07 bcardosolopes

Pending replies to the questions asked.

It seems to me that all questions should be resolved? Am I missing some questions? GitHub does not show my response to your latest question in the thread, it's here: https://github.com/llvm/clangir/pull/513#discussion_r1649904257

Lancern avatar Jul 03 '24 05:07 Lancern

It seems to me that all questions should be resolved? Am I missing some questions? GitHub does not show my response to your latest question in the thread, it's here: #513 (comment)

Perfect, got confused by github! All seems good then, thanks for all the hard work here!

bcardosolopes avatar Jul 03 '24 18:07 bcardosolopes