YAKL
YAKL copied to clipboard
Hierarchical Parallelism improvements
Create inner level parallel reductions using CUB and hipCUB inside the YAKL_reductions.h
header, and create simple intrinsics
functions for sum
, maxval
, minval
, and product
.
Whenever inner-level loop size exceeds the launch bounds size, strip-mine the loop. This allows more flexibility and fewer concerns for the user.
Also include batched reductions to avoid the need for atomics in most of the codes!
This Issue is on pause until the need for more functionality arises from MPAS-Ocean folks.
Omitted by hierarchical parallelism removal.