EKAT icon indicating copy to clipboard operation
EKAT copied to clipboard

Implement specialized templates for pow<2>, pow<3>, pow<4>.

Open jeff-cohere opened this issue 5 years ago • 3 comments
trafficstars

Is your feature request related to a problem? Please describe. In order to make bit-for-bit testing easier, it would be nice to have specialized implementations of the pow function for low integer exponents. In particular, see this conversation.

Describe the solution you'd like C++ template specializations for pow<2>, pow<3>, and pow<4>. Perhaps we should include some Fortran support for these and other functions in EKAT as well.

jeff-cohere avatar Sep 16 '20 23:09 jeff-cohere

We can probably do a template utility for the generic pow<N> (log2(N) recursions). It should be fairly straightforward.

bartgol avatar Sep 17 '20 03:09 bartgol

Do you have any HOMME code or other prior art we can use? Or do you have a new implementation in mind?

jeff-cohere avatar Sep 17 '20 23:09 jeff-cohere

I have an impl for a runtime version; should be immediate to convert to templated (or even add both).

Btw, the bfb_pow_impl function in that file is, imho, a better solution for bfb pow than bridging F90 to Cuda. One might argue that it is expensive, but it might be a wash with the Cuda kernel launch (I never checked though).

bartgol avatar Sep 17 '20 23:09 bartgol