quda
quda copied to clipboard
Fused DWF + NVSHMEM
~~This PR makes the DWF fused kernels run with NVSHMEM. Running on one node on Selene with 1x2x2x2 and Ls = 12, getting (performance numbers are in GFLOPS)~~ No there is no speed up.
Thanks @maddyscientist. I have added update the doxygen to cover active
, and also applied the changes to constantInv
in https://github.com/lattice/quda/pull/1310/commits/4040ff814f9a4e8aa4b3088313e8cea103557aef. I have also tested constantInv
code path.