Matthew Michel
Matthew Michel
To increase performance of segmented reduction and segmented scans, the SYCL backend implementations of the algorithms have been rewritten using the SYCL group and sub-group functions. The new algorithms operate...
An implementation of transform_output_iterator, an iterator that performs a given transformation upon writes only and avoids unnecessary intermediate storage. Closes #404
This PR introduces some additional tuning for the vectorized search algorithms (binary_search, lower_bound, and upper_bound) where a series of searches across a provided set of search keys are performed on...
`__find_start_point` is the first step in the merge pattern prior to serial merging. One of the current optimizations employed in the merge pattern is to restrict the `_Index` size type...
Currently with C++20, `oneapi::dpl::zip_iterator` cannot satisfy the C++20 [std::input_iterator](https://en.cppreference.com/w/cpp/iterator/input_iterator) constraint due to it's failure to satisfy [std::indirectly_readable](https://en.cppreference.com/w/cpp/iterator/indirectly_readable) which requires that the iterator's value and reference types have a common reference....
Performance issues have been identified within our `parallel_for` kernel on Intel® Data Center GPU Max for input sizes that are not powers of two. The root cause of this has...
These changes make sense to me. There might be potential benefits of utilizing 32-bit addressing on other algorithms as well. This is beyond this PR but maybe something to keep...
Revere counter iterator looks like a little bit strange for me, As far as I know we already have `oneapi::dpl::counting_iterator`. Is it not enough? Also when I see the API...
Our current CMake AOT compilation mode sets `-fsycl-targets=spir64_gen` for GPU targets and `-fsycl-targets=spir64_x86_64` for CPU targets. This is problematic for two reasons: The first is that some backends do not...
## Summary This PR implements a SYCL backend `reduce_by_segment` by using higher level calls to reduce-then-scan along with new specialty functors to achieve a segmented reduction. This PR is an...