YAKL icon indicating copy to clipboard operation
YAKL copied to clipboard

Hierarchical Parallelism improvements

Open mrnorman opened this issue 2 years ago • 2 comments

Create inner level parallel reductions using CUB and hipCUB inside the YAKL_reductions.h header, and create simple intrinsics functions for sum, maxval, minval, and product.

Whenever inner-level loop size exceeds the launch bounds size, strip-mine the loop. This allows more flexibility and fewer concerns for the user.

mrnorman avatar Jul 14 '22 12:07 mrnorman

Also include batched reductions to avoid the need for atomics in most of the codes!

mrnorman avatar Aug 15 '22 16:08 mrnorman

This Issue is on pause until the need for more functionality arises from MPAS-Ocean folks.

mrnorman avatar Sep 01 '22 12:09 mrnorman

Omitted by hierarchical parallelism removal.

mrnorman avatar Oct 28 '24 12:10 mrnorman