[SYCL][ABI-Break] Improve Queue fill
This PR changes the queue.fill() implementation to make use of the native functions for a specific backend. It also unifies that implementation with the one for memset, since it is just an 8-bit subset operation of fill.
In the CUDA case, both memset and fill are currently calling urEnqueueUSMFill which depending on the size of the filling pattern calls either cuMemsetD8Async, cuMemsetD16Async, cuMemsetD32Async or commonMemSetLargePattern. Before this patch memset was using the same thing, just beforehand setting patternSize always to 1 byte which resulted in calling cuMemsetD8Async. In other backends, the behaviour is analogous.
The fill method was just invoking a parallel_for to fill the memory with the pattern which was making this operation quite slow.
This PR is now ready for review. The only additional changes since https://github.com/intel/llvm/pull/12702 are:
- added
getDeviceBackend()member function to handler and leftfillfor L0 backend as it was: https://github.com/intel/llvm/pull/13788/commits/dd0cb78f0abf7799c3e3f449eb28015b1dd095b5, https://github.com/intel/llvm/pull/13788/commits/7685a565cfb86153ea5208e41931c21c6d9dedde - as a result, fixed
DiscardEventse2e tests to take into consideration different PI symbol forfillbased on the target: https://github.com/intel/llvm/pull/13788/commits/679bcd43c9772affe3b56b0ca44d440d413e4b3b
@intel/sycl-graphs-reviewers @intel/dpcpp-nativecpu-pi-reviewers @intel/unified-runtime-reviewers @ldrumm @uditagarwal97, could I get a review on this, please?
@konradkusiak97 - It looks like this is breaking ABI, based on changes in symbols in the Linux and Windows dumps. Please either reintroduce the symbols and comment that they are unused, or mark this with "[ABI-Break]" in the title and let us know so we can add the corresponding tag.
If an ABI break is needed, these changes will have to wait for the next ABI breaking window, which should be soon. See https://github.com/intel/llvm/blob/sycl/sycl/doc/developer/ABIPolicyGuide.md#changing-abi for more information on ABI-breaks.
@konradkusiak97 - It looks like this is breaking ABI, based on changes in symbols in the Linux and Windows dumps. Please either reintroduce the symbols and comment that they are unused, or mark this with "[ABI-Break]" in the title and let us know so we can add the corresponding tag.
If an ABI break is needed, these changes will have to wait for the next ABI breaking window, which should be soon. See https://github.com/intel/llvm/blob/sycl/sycl/doc/developer/ABIPolicyGuide.md#changing-abi for more information on ABI-breaks.
@steffenlarsen - thanks for flagging. I added an overload of ext_oneapi_fill_usm_cmd_buffer and marked it as TODO to remove during ABI breaking window. From the looks of symbols linux it seems that this should be fine now. I don't know why symbols windows diff looks the way it does but it seems it just rearranged positions of a few symbols but none of them were removed or modified.
@konradkusiak97 - It looks like this is breaking ABI, based on changes in symbols in the Linux and Windows dumps. Please either reintroduce the symbols and comment that they are unused, or mark this with "[ABI-Break]" in the title and let us know so we can add the corresponding tag. If an ABI break is needed, these changes will have to wait for the next ABI breaking window, which should be soon. See https://github.com/intel/llvm/blob/sycl/sycl/doc/developer/ABIPolicyGuide.md#changing-abi for more information on ABI-breaks.
@steffenlarsen - thanks for flagging. I added an overload of
ext_oneapi_fill_usm_cmd_bufferand marked it asTODOto remove during ABI breaking window. From the looks of symbols linux it seems that this should be fine now. I don't know why symbols windows diff looks the way it does but it seems it just rearranged positions of a few symbols but none of them were removed or modified.
Please create a PR to do that with an abi-break label. We'll start merging those PRs in a few days.
Please create a PR to do that with an
abi-breaklabel. We'll start merging those PRs in a few days.
Do you mean to create a seperate PR which removes those TODO unnecessary overloads in memory_manager.hpp, after this PR is merged? If the ABI breaking window is in a few days I could also just make this PR abi breaking and remove those overloading functions.
Please create a PR to do that with an
abi-breaklabel. We'll start merging those PRs in a few days.Do you mean to create a seperate PR which removes those
TODOunnecessary overloads in memory_manager.hpp, after this PR is merged? If the ABI breaking window is in a few days I could also just make this PR abi breaking and remove those overloading functions.
Either would be fine.
For context for the reviewrs: this PR was mostly reviewed in https://github.com/intel/llvm/pull/12702 which was later reverted due to post-commit failure on level-zero. This was worked around in this PR. Previously there was also regression in one of the sycl-cts tests and that was also fixed with with this UR PR: https://github.com/oneapi-src/unified-runtime/pull/1603. So the only new changes are:
- Left
Q.fil()as it was for L0 which required addinggetDeviceBackend()member function tohandler: https://github.com/intel/llvm/pull/13788/commits/dd0cb78f0abf7799c3e3f449eb28015b1dd095b5, https://github.com/intel/llvm/pull/13788/commits/7685a565cfb86153ea5208e41931c21c6d9dedde - as a result, fixed
DiscardEventse2e tests to take into consideration different PI symbol forfillbased on the target: https://github.com/intel/llvm/pull/13788/commits/679bcd43c9772affe3b56b0ca44d440d413e4b3b - Removed unnecessary overloads of
fill_usmandext_oneapi_fill_usm_cmd_bufferfor ABI-break: https://github.com/intel/llvm/pull/13788/commits/38722b481f638dfd858cb4accc7caf3e970fb655
@intel/sycl-graphs-reviewers @intel/dpcpp-nativecpu-pi-reviewers @intel/llvm-reviewers-cuda @intel/unified-runtime-reviewers could you help give a final review for this PR so it can be merged in the coming ABI-breaking window please?
unadorned char is maybe-signed - so signedness here is host dependent.
Remember C++ has three char types:
char: maybe signed unsigned char: always unsigned signed char: always signed
Only unsigned char is guaranteed to have pure binary representation with no trap behaviour and no padding - and is therefore the only type suitable for type punning/copying other types. See the description of memset
Thus: those vector<char> should become vector<unsigned char> for consistency with memset et al, as well as to ensure that you don't violate aliasing rules
Only
unsigned charis guaranteed to have pure binary representation with no trap behaviour and no padding - and is therefore the only type suitable for type punning/copying other types. See the description ofmemsetThus: those
vector<char>should becomevector<unsigned char>for consistency withmemsetet al, as well as to ensure that you don't violate aliasing rules
Thanks for this tip! I didn't know this before. The thing is, this depends on the already existing member of the handler class, the std::vector<char> MPattern which is used to create command group classes such such as CGFillUSM, CGFill, CGMemset, CGFill2DUSM, CGMemset2DUSM, not sure if that's all. They all seem to use char for the MPattern. I agree though that this should be changed. I can do it in a separate PR or here if you think that it won't pollute this PR with too many unrelated changes. What do you think @ldrumm ?
edit:
or I could make the changes only to CGFillUSM which is relevant here and make this temporary workaround in handler.cpp:367:
diff --git a/sycl/source/handler.cpp b/sycl/source/handler.cpp
index 872e31d9b440..51564336ef31 100644
--- a/sycl/source/handler.cpp
+++ b/sycl/source/handler.cpp
@@ -364,10 +364,13 @@ event handler::finalize() {
CommandGroup.reset(new detail::CGCopyUSM(MSrcPtr, MDstPtr, MLength,
std::move(CGData), MCodeLoc));
break;
- case detail::CG::FillUSM:
+ case detail::CG::FillUSM: {
+ std::vector<unsigned char> MPatternU(MPattern.size());
+ std::memcpy(MPatternU.data(), MPattern.data(), MPattern.size());
CommandGroup.reset(new detail::CGFillUSM(
- std::move(MPattern), MDstPtr, MLength, std::move(CGData), MCodeLoc));
+ std::move(MPatternU), MDstPtr, MLength, std::move(CGData), MCodeLoc));
break;
+ }
From https://en.cppreference.com/w/cpp/language/types:
char — type for character representation which can be most efficiently processed on the target system (has the same representation and alignment as either signed char or unsigned char, but is always a distinct type). Multibyte characters strings use this type to represent code units. For every value of type unsigned char in range [0, 255], converting the value to char and then back to unsigned char produces the original value.(since C++11) The signedness of char depends on the compiler and the target platform: the defaults for ARM and PowerPC are typically unsigned, the defaults for x86 and x64 are typically signed.
I don't see how unsigned char can be better than just char. And if you want to be really fancy about byte manipulations, just use std::byte.
To make this change from char to unsigned char or std::byte I'd need to change the type of std::vector<char> MPattern, and that requires then changing all the following command group classes which make use of it. All of them such as CGMemset etc. make use of char so this would require changing it in a lot of different places which are not really related to this PR.
@intel/dpcpp-nativecpu-pi-reviewers @intel/unified-runtime-reviewers could you have a look at this PR? This comment summarizes what has been done since https://github.com/intel/llvm/pull/12702.
LGTM afters
s/char/unsigned char/g,
This wasn't actually that complicated. This is done now @ldrumm.
@intel/llvm-gatekeepers this is now ready to be merged