oneDPL
oneDPL copied to clipboard
Investigate a 32-bit addressing mode in SYCL kernels
These changes make sense to me. There might be potential benefits of utilizing 32-bit addressing on other algorithms as well. This is beyond this PR but maybe something to keep in mind.
Originally posted by @julianmi in https://github.com/oneapi-src/oneDPL/pull/1446#pullrequestreview-1954920510
There are several places in our codebase now where we define separate kernels for inputs that can be indexed by a 32-bit unsigned integer and inputs with a size type that must be 64-bit. We then perform a runtime check to select the best kernel. This has had significant performance benefit in these cases.
The general default behavior in oneDPL algorithms seems to use 64-bit addressing. We should investigate if there is any wide-range performance benefit of using 32-bit addressing where supported along with the best way to expose this option to users.
I have removed this from the 2022.7.0
milestone. Based on some initial investigation, widespread performance benefit is not observed with this change. Benefit seems to be limited to instances with heavy addressing that can't be hidden by other operations. Our current approach of compiling multiple kernels with a run-time dispatch (as seen in our merge kernel) or incurring a small amount of overhead at the start of the kernel to select the right indexing approach (as seen in our vectorized binary search kernels) seems to work well for the small cases where this is a greatly beneficial.
There is still a small benefit I can observe (up to a few percent in some scan algorithms). I am leaving this open for now if we wish to investigate in the future. However, I think it is low priority for the time being.