oneDPL
oneDPL copied to clipboard
SYCL Rewrites of inclusive_scan_by_segment, exclusive_scan_by_segment, and reduce_by_segment
To increase performance of segmented reduction and segmented scans, the SYCL backend implementations of the algorithms have been rewritten using the SYCL group and sub-group functions.
The new algorithms operate in two primary phases:
- Perform a segmented scan or reduction within each individual work-group with equal work assignment, storing partially completed scan / reduction values to global memory.
- Apply the partial reduction / scan values from previous work-group(s) to the current.
I there any performance measurement after that re-writing? It would be very interesting to have a look.
Rebased to pick up bugfix for legacy range implementation.
@mmichel11 I think required to update this branch from current main branch state.
I have addressed the comments on this PR thus far. I do have a remaining question with regards to the header file format Mikhail brought up which I have asked in response to his comment.
This branch has been rebased with the current main, and we are passing our updated tests across our configurations for the scan by segments. We should be in a good state to merge soon.