kokkos-kernels icon indicating copy to clipboard operation
kokkos-kernels copied to clipboard

Support for partition_spaces and separate execution space instances (GPU streams) in Kokkos Kernels.

Open wlruys opened this issue 4 years ago • 1 comments

Opening up an issue for this after a conversation on the Slack. (feature-request)

Now that CUDA/HIP/SYCL stream support and partition_spaces are developed and more stable in Kokkos Core, it would be great to have this support in Kokkos Kernels as well.

This would allow dispatching BLAS and other kernels of 'medium' size, that are too large for a single block thread team and too small to be worth locking the whole device.

For instance something like:

ExecSpace spaces[N];
partition_space(ExecSpace(),N,spaces);
KokkosBlas::GEMM(spaces[0], "N", "N", one, A0, B0, one, C0);
KokkosBlas::GEMM(spaces[1], "N", "N", one, A1, B1, one, C1);

to dispatch the two kernels asynchronously.

wlruys avatar Sep 29 '21 19:09 wlruys

@dialecticDolt I merged the work on this feature in PR #1131 let me know if that meets your requirements? If so we can probably close this issue, otherwise let's discuss what more is needed.

lucbv avatar Oct 25 '21 16:10 lucbv