Arturo Vargas
Arturo Vargas
This PR is a collaboration space for exploring optimization within RAJA launch and the loop abstraction
Certain backends are not supported, we should reconfigure our framework so empty files are not required for building.
DRAFT-PR. -- This MR adds the option to store threadblock info in the launch ctx avoiding calling blockDim.x,y,z during the loop methods in raja launch
Some of kernels use const Real_ptr; we believe usage should be Real_const_ptr.
- [ ] Add an atomic variant for the mass PA kernel - [ ] Update Diffusion kernel -- mfem version has been updated - [ ] Update reference link...
Some kernels have been observed to use blockIdx.x while others use the templated blocksize. We should do a pass to ensure consistency and consider different "tuning" versions if we want...
# Summary This test modifies an existing kernel to use direct threading.
It has been observed that performing block stride loops on AMD decreases performance, to increase performance use a direct mapping. Please see FEM kernels under apps.