quda
quda copied to clipboard
Reduction abstraction needs to be applied to choice of MPI reducer in `TunableReduction`
Reduction abstraction is presently broken for non-summation reductions. While the abstracted launch can be passed different reducers for the kernel, the MPI reduction presently assumes that summation is being performed. This will break, for example, force monitoring, max element computing for half precision multigrid, etc.
if (!commAsyncReduction()) {
arg.complete(result, stream);
if (!activeTuning() && commGlobalReduction()) {
// FIXME - this will break when we have non-summation reductions (MG fixed point will break and so will force monitoring)
comm_allreduce_array((double*)result.data(), result.size() * sizeof(T) / sizeof(double));
}
}