EKAT
EKAT copied to clipboard
Implement specialized templates for pow<2>, pow<3>, pow<4>.
Is your feature request related to a problem? Please describe.
In order to make bit-for-bit testing easier, it would be nice to have specialized implementations of the pow function for low integer exponents. In particular, see this conversation.
Describe the solution you'd like
C++ template specializations for pow<2>, pow<3>, and pow<4>. Perhaps we should include some Fortran support for these and other functions in EKAT as well.
We can probably do a template utility for the generic pow<N> (log2(N) recursions). It should be fairly straightforward.
Do you have any HOMME code or other prior art we can use? Or do you have a new implementation in mind?
I have an impl for a runtime version; should be immediate to convert to templated (or even add both).
Btw, the bfb_pow_impl function in that file is, imho, a better solution for bfb pow than bridging F90 to Cuda. One might argue that it is expensive, but it might be a wash with the Cuda kernel launch (I never checked though).