llvm icon indicating copy to clipboard operation
llvm copied to clipboard

[SYCL][ABI-Break] Improve Queue fill

Open keyradical opened this issue 1 year ago • 10 comments

This PR changes the queue.fill() implementation to make use of the native functions for a specific backend. It also unifies that implementation with the one for memset, since it is just an 8-bit subset operation of fill.

In the CUDA case, both memset and fill are currently calling urEnqueueUSMFill which depending on the size of the filling pattern calls either cuMemsetD8Async, cuMemsetD16Async, cuMemsetD32Async or commonMemSetLargePattern. Before this patch memset was using the same thing, just beforehand setting patternSize always to 1 byte which resulted in calling cuMemsetD8Async. In other backends, the behaviour is analogous.

The fill method was just invoking a parallel_for to fill the memory with the pattern which was making this operation quite slow.

keyradical avatar May 15 '24 09:05 keyradical

This PR is now ready for review. The only additional changes since https://github.com/intel/llvm/pull/12702 are:

  • added getDeviceBackend() member function to handler and left fill for L0 backend as it was: https://github.com/intel/llvm/pull/13788/commits/dd0cb78f0abf7799c3e3f449eb28015b1dd095b5, https://github.com/intel/llvm/pull/13788/commits/7685a565cfb86153ea5208e41931c21c6d9dedde
  • as a result, fixed DiscardEvents e2e tests to take into consideration different PI symbol for fill based on the target: https://github.com/intel/llvm/pull/13788/commits/679bcd43c9772affe3b56b0ca44d440d413e4b3b

@intel/sycl-graphs-reviewers @intel/dpcpp-nativecpu-pi-reviewers @intel/unified-runtime-reviewers @ldrumm @uditagarwal97, could I get a review on this, please?

keyradical avatar Jun 18 '24 14:06 keyradical

@konradkusiak97 - It looks like this is breaking ABI, based on changes in symbols in the Linux and Windows dumps. Please either reintroduce the symbols and comment that they are unused, or mark this with "[ABI-Break]" in the title and let us know so we can add the corresponding tag.

If an ABI break is needed, these changes will have to wait for the next ABI breaking window, which should be soon. See https://github.com/intel/llvm/blob/sycl/sycl/doc/developer/ABIPolicyGuide.md#changing-abi for more information on ABI-breaks.

steffenlarsen avatar Jun 20 '24 06:06 steffenlarsen

@konradkusiak97 - It looks like this is breaking ABI, based on changes in symbols in the Linux and Windows dumps. Please either reintroduce the symbols and comment that they are unused, or mark this with "[ABI-Break]" in the title and let us know so we can add the corresponding tag.

If an ABI break is needed, these changes will have to wait for the next ABI breaking window, which should be soon. See https://github.com/intel/llvm/blob/sycl/sycl/doc/developer/ABIPolicyGuide.md#changing-abi for more information on ABI-breaks.

@steffenlarsen - thanks for flagging. I added an overload of ext_oneapi_fill_usm_cmd_buffer and marked it as TODO to remove during ABI breaking window. From the looks of symbols linux it seems that this should be fine now. I don't know why symbols windows diff looks the way it does but it seems it just rearranged positions of a few symbols but none of them were removed or modified.

keyradical avatar Jun 21 '24 11:06 keyradical

@konradkusiak97 - It looks like this is breaking ABI, based on changes in symbols in the Linux and Windows dumps. Please either reintroduce the symbols and comment that they are unused, or mark this with "[ABI-Break]" in the title and let us know so we can add the corresponding tag. If an ABI break is needed, these changes will have to wait for the next ABI breaking window, which should be soon. See https://github.com/intel/llvm/blob/sycl/sycl/doc/developer/ABIPolicyGuide.md#changing-abi for more information on ABI-breaks.

@steffenlarsen - thanks for flagging. I added an overload of ext_oneapi_fill_usm_cmd_buffer and marked it as TODO to remove during ABI breaking window. From the looks of symbols linux it seems that this should be fine now. I don't know why symbols windows diff looks the way it does but it seems it just rearranged positions of a few symbols but none of them were removed or modified.

Please create a PR to do that with an abi-break label. We'll start merging those PRs in a few days.

aelovikov-intel avatar Jun 25 '24 16:06 aelovikov-intel

Please create a PR to do that with an abi-break label. We'll start merging those PRs in a few days.

Do you mean to create a seperate PR which removes those TODO unnecessary overloads in memory_manager.hpp, after this PR is merged? If the ABI breaking window is in a few days I could also just make this PR abi breaking and remove those overloading functions.

keyradical avatar Jun 26 '24 08:06 keyradical

Please create a PR to do that with an abi-break label. We'll start merging those PRs in a few days.

Do you mean to create a seperate PR which removes those TODO unnecessary overloads in memory_manager.hpp, after this PR is merged? If the ABI breaking window is in a few days I could also just make this PR abi breaking and remove those overloading functions.

Either would be fine.

aelovikov-intel avatar Jun 26 '24 14:06 aelovikov-intel

For context for the reviewrs: this PR was mostly reviewed in https://github.com/intel/llvm/pull/12702 which was later reverted due to post-commit failure on level-zero. This was worked around in this PR. Previously there was also regression in one of the sycl-cts tests and that was also fixed with with this UR PR: https://github.com/oneapi-src/unified-runtime/pull/1603. So the only new changes are:

  • Left Q.fil() as it was for L0 which required adding getDeviceBackend() member function to handler: https://github.com/intel/llvm/pull/13788/commits/dd0cb78f0abf7799c3e3f449eb28015b1dd095b5, https://github.com/intel/llvm/pull/13788/commits/7685a565cfb86153ea5208e41931c21c6d9dedde
  • as a result, fixed DiscardEvents e2e tests to take into consideration different PI symbol for fill based on the target: https://github.com/intel/llvm/pull/13788/commits/679bcd43c9772affe3b56b0ca44d440d413e4b3b
  • Removed unnecessary overloads of fill_usm and ext_oneapi_fill_usm_cmd_buffer for ABI-break: https://github.com/intel/llvm/pull/13788/commits/38722b481f638dfd858cb4accc7caf3e970fb655

@intel/sycl-graphs-reviewers @intel/dpcpp-nativecpu-pi-reviewers @intel/llvm-reviewers-cuda @intel/unified-runtime-reviewers could you help give a final review for this PR so it can be merged in the coming ABI-breaking window please?

keyradical avatar Jun 28 '24 09:06 keyradical

unadorned char is maybe-signed - so signedness here is host dependent.

Remember C++ has three char types:

char: maybe signed unsigned char: always unsigned signed char: always signed

Only unsigned char is guaranteed to have pure binary representation with no trap behaviour and no padding - and is therefore the only type suitable for type punning/copying other types. See the description of memset

Thus: those vector<char> should become vector<unsigned char> for consistency with memset et al, as well as to ensure that you don't violate aliasing rules

ldrumm avatar Jun 28 '24 11:06 ldrumm

Only unsigned char is guaranteed to have pure binary representation with no trap behaviour and no padding - and is therefore the only type suitable for type punning/copying other types. See the description of memset

Thus: those vector<char> should become vector<unsigned char> for consistency with memset et al, as well as to ensure that you don't violate aliasing rules

Thanks for this tip! I didn't know this before. The thing is, this depends on the already existing member of the handler class, the std::vector<char> MPattern which is used to create command group classes such such as CGFillUSM, CGFill, CGMemset, CGFill2DUSM, CGMemset2DUSM, not sure if that's all. They all seem to use char for the MPattern. I agree though that this should be changed. I can do it in a separate PR or here if you think that it won't pollute this PR with too many unrelated changes. What do you think @ldrumm ?

edit: or I could make the changes only to CGFillUSM which is relevant here and make this temporary workaround in handler.cpp:367:

diff --git a/sycl/source/handler.cpp b/sycl/source/handler.cpp
index 872e31d9b440..51564336ef31 100644
--- a/sycl/source/handler.cpp
+++ b/sycl/source/handler.cpp
@@ -364,10 +364,13 @@ event handler::finalize() {
     CommandGroup.reset(new detail::CGCopyUSM(MSrcPtr, MDstPtr, MLength,
                                              std::move(CGData), MCodeLoc));
     break;
-  case detail::CG::FillUSM:
+  case detail::CG::FillUSM: {
+    std::vector<unsigned char> MPatternU(MPattern.size());
+    std::memcpy(MPatternU.data(), MPattern.data(), MPattern.size());
     CommandGroup.reset(new detail::CGFillUSM(
-        std::move(MPattern), MDstPtr, MLength, std::move(CGData), MCodeLoc));
+        std::move(MPatternU), MDstPtr, MLength, std::move(CGData), MCodeLoc));
     break;
+  }

keyradical avatar Jun 28 '24 13:06 keyradical

From https://en.cppreference.com/w/cpp/language/types:

char — type for character representation which can be most efficiently processed on the target system (has the same representation and alignment as either signed char or unsigned char, but is always a distinct type). Multibyte characters strings use this type to represent code units. For every value of type unsigned char in range [​0​, 255], converting the value to char and then back to unsigned char produces the original value.(since C++11) The signedness of char depends on the compiler and the target platform: the defaults for ARM and PowerPC are typically unsigned, the defaults for x86 and x64 are typically signed.

I don't see how unsigned char can be better than just char. And if you want to be really fancy about byte manipulations, just use std::byte.

aelovikov-intel avatar Jun 28 '24 15:06 aelovikov-intel

To make this change from char to unsigned char or std::byte I'd need to change the type of std::vector<char> MPattern, and that requires then changing all the following command group classes which make use of it. All of them such as CGMemset etc. make use of char so this would require changing it in a lot of different places which are not really related to this PR.

keyradical avatar Jul 01 '24 12:07 keyradical

@intel/dpcpp-nativecpu-pi-reviewers @intel/unified-runtime-reviewers could you have a look at this PR? This comment summarizes what has been done since https://github.com/intel/llvm/pull/12702.

keyradical avatar Jul 02 '24 10:07 keyradical

LGTM afters s/char/unsigned char/g,

This wasn't actually that complicated. This is done now @ldrumm.

keyradical avatar Jul 05 '24 09:07 keyradical

@intel/llvm-gatekeepers this is now ready to be merged

keyradical avatar Jul 05 '24 14:07 keyradical