llvm icon indicating copy to clipboard operation
llvm copied to clipboard

[SYCL] Make queue fill use native functions

Open konradkusiak97 opened this issue 1 year ago • 1 comments

This PR changes the queue.fill() and cgh.fill() implementation to make use of the native functions for a specific backend. It also unifies that implementation with the one for memset, since it is just an 8-bit subset operation of fill.

Both memset and fill are currently calling urEnqueueUSMFill which depending on the size of the filling pattern calls either cuMemsetD8Async, cuMemsetD16Async, cuMemsetD32Async or commonMemSetLargePattern. Before this patch memset was using the same thing, just beforehand setting patternSize always to 1 byte which resulted in calling cuMemsetD8Async.

The fill method was just invoking a parallel_for to fill the memory with the pattern which was making this operation quite slow. This patch improves performance of it by roughly 40% on NVIDIA.

konradkusiak97 avatar Feb 13 '24 10:02 konradkusiak97

Looks like there is a bug in ROCM prior to 6.0.0 which causes hipMemset2D to behave incorrectly on host-pinned memory. I'm working on a workaround for this.

konradkusiak97 avatar Feb 22 '24 12:02 konradkusiak97

Hi, can you also update the pi_native_cpu_symbol_check.dump file similarly to what you have done for the other symbol check files? Thank you

PietroGhg avatar Mar 08 '24 14:03 PietroGhg

This PR is ready for review.

konradkusiak97 avatar Mar 29 '24 11:03 konradkusiak97

Hi, can you please update the pi_nativecpu_symbol_check.dump too? Thanks

Apologies, I thought I'd already done that but it turns out I haven't, it's fixed now.

konradkusiak97 avatar Apr 02 '24 09:04 konradkusiak97

The dependent patches in the UR are all merged now so this PR is further ready for review. Friendly ping to @steffenlarsen and @intel/unified-runtime-reviewers

konradkusiak97 avatar May 02 '24 14:05 konradkusiak97

@intel/llvm-gatekeepers this is ready to be merged

konradkusiak97 avatar May 02 '24 20:05 konradkusiak97

@konradkusiak97 - It looks like this may have caused https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Basic/out_of_order_queue_status.cpp to fail on Gen12. Could you please have a look?

steffenlarsen avatar May 03 '24 12:05 steffenlarsen

@konradkusiak97 - It looks like this may have caused https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Basic/out_of_order_queue_status.cpp to fail on Gen12. Could you please have a look?

I can't reproduce this locally, even with a debug build but I can see the post-commit failures, will investigate further.

konradkusiak97 avatar May 03 '24 14:05 konradkusiak97

@konradkusiak97 - It looks like this may have caused https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Basic/out_of_order_queue_status.cpp to fail on Gen12. Could you please have a look?

I can't reproduce this locally, even with a debug build but I can see the post-commit failures, will investigate further.

Any updates on this? Should we just revert this PR?

aelovikov-intel avatar May 06 '24 22:05 aelovikov-intel

@konradkusiak97 - It looks like this may have caused https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Basic/out_of_order_queue_status.cpp to fail on Gen12. Could you please have a look?

I can't reproduce this locally, even with a debug build but I can see the post-commit failures, will investigate further.

Any updates on this? Should we just revert this PR?

I'm still working on the fix so let's revert it for now.

konradkusiak97 avatar May 06 '24 22:05 konradkusiak97