gen - add support for mixed precision operators

Open zatkins-dev opened this issue 6 months ago • 2 comments

Adds support for mixed precision operators, set at creation time, for the CUDA gen backend. This is made possible by defining a second scalar type CeedScalarCPU, which is always double precision, and changing the CeedScalar type during the JiT pass if the CEED_JIT_MIXED_PRECISION define is set.

The inputs to the device kernels are always CeedScalarCPU arrays, to avoid having to muck around with multiple pointers and such in a CeedVector. In gen, we only do things to the input and output arrays at the beginning and end of the kernel, so all of the computation will be happening with the single precision CeedScalar arrays we copy values into. This approach minimizes the code differences between mixed and full precision runs, essentially just requiring the helper functions to have extra template parameters to ensure the input types are correct.

The support for mixed precision operators is at the backend level, while the actual usage of mixed precision operations is defined per-operator to provide maximal flexibility.

This can be extended to the CUDA ref backend too, though the benefits will likely be more mild.

@jeremylt and @nbeams, does this seem like a reasonable approach?

Jul 10 '25 23:07 zatkins-dev