cccl icon indicating copy to clipboard operation
cccl copied to clipboard

[FEA]: Optimize Complex FMA by exploiting lazy evaluation

Open fbusato opened this issue 11 months ago • 3 comments

Is this a duplicate?

  • [X] I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct

Area

libcu++

Is your feature request related to a problem? Please describe.

Given a, b, c complex numbers, a * b +c is suboptimal compared to fma(a, b, c), see cuCfmaf implementation in cuComplex.h

Describe the solution you'd like

a * b should not directly compute the result, but returns a structure holding their values to allow lazy evaluation. This allows (a * b) to be fused with + c and generate fma optimal code

Describe alternatives you've considered

No response

Additional context

No response

fbusato avatar Mar 11 '24 21:03 fbusato

@fbusato is this a request for cuda::std::complex? We wouldn't be able to deviate from the standard on the behavior of operator*, but we could always add an extension type like cuda::complex. There have been several independent reasons that have come up for a cuda::complex type to exist, so this would be far from the first.

jrhemstad avatar Mar 15 '24 00:03 jrhemstad

I am not fully following here. Are you requesting us to implement expression templates for complex?

miscco avatar Mar 15 '24 07:03 miscco

I perfectly understand this constraint. It would be nice to add cuda::complex type if it is not too much effort.

fbusato avatar Mar 18 '24 17:03 fbusato