cccl
cccl copied to clipboard
[FEA]: Optimize Complex FMA by exploiting lazy evaluation
Is this a duplicate?
- [X] I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct
Area
libcu++
Is your feature request related to a problem? Please describe.
Given a
, b
, c
complex numbers, a * b +c
is suboptimal compared to fma(a, b, c)
,
see cuCfmaf
implementation in cuComplex.h
Describe the solution you'd like
a * b
should not directly compute the result, but returns a structure holding their values to allow lazy evaluation.
This allows (a * b)
to be fused with + c
and generate fma
optimal code
Describe alternatives you've considered
No response
Additional context
No response
@fbusato is this a request for cuda::std::complex
? We wouldn't be able to deviate from the standard on the behavior of operator*
, but we could always add an extension type like cuda::complex
. There have been several independent reasons that have come up for a cuda::complex
type to exist, so this would be far from the first.
I am not fully following here. Are you requesting us to implement expression templates for complex?
I perfectly understand this constraint. It would be nice to add cuda::complex
type if it is not too much effort.