Adam Fidel
Adam Fidel
This PR provides an implementation of the single-pass scan algorithm as a kernel template.
This PR allows us to change the work-group size and iterations per work-item for the scan family of algorithms. We choose values for these parameters that optimize performance for devices...
There are a few potential issues with oneDPL's global `sycl::queue`, including: - Requirement that a SYCL device is available whenever oneDPL is included as mentioned in #1060 - Static initialization...
This PR adds documentation for the scan kernel template.
On some platforms, specifically Fedora 40 with GCC 14, the included `pstl_config.h` is such that the macros `_PSTL_UDR_PRESENT` and `_PSTL_UDS_PRESENT` are present but defined as an empty macro. This is...
From https://github.com/oneapi-src/oneDPL/pull/1320#pullrequestreview-2018418038: > Memory footprint in SLM of joint algorithms, and how to communicate memory footprint requirements to the end user to provide them the proper guidance for selecting kernel...
It may be possible to reduce the number of atomics in the scan kernel template algorithm. Currently, we use atomic loads/stores for both the status flags and status values, but...
PR #1320 introduced a version of the scan kernel template that does not allow the user to specify an initial value. This is a useful feature to have.