llvm
llvm copied to clipboard
[SYCL] Make queue fill use native functions
This PR changes the queue.fill() and cgh.fill() implementation to make use of the native functions for a specific backend. It also unifies that implementation with the one for memset, since it is just an 8-bit subset operation of fill.
Both memset and fill are currently calling urEnqueueUSMFill which depending on the size of the filling pattern calls either cuMemsetD8Async, cuMemsetD16Async, cuMemsetD32Async or commonMemSetLargePattern. Before this patch memset was using the same thing, just beforehand setting patternSize always to 1 byte which resulted in calling cuMemsetD8Async.
The fill method was just invoking a parallel_for to fill the memory with the pattern which was making this operation quite slow. This patch improves performance of it by roughly 40% on NVIDIA.
Looks like there is a bug in ROCM prior to 6.0.0 which causes hipMemset2D to behave incorrectly on host-pinned memory. I'm working on a workaround for this.
Hi, can you also update the pi_native_cpu_symbol_check.dump file similarly to what you have done for the other symbol check files? Thank you
This PR is ready for review.
Hi, can you please update the
pi_nativecpu_symbol_check.dumptoo? Thanks
Apologies, I thought I'd already done that but it turns out I haven't, it's fixed now.
The dependent patches in the UR are all merged now so this PR is further ready for review. Friendly ping to @steffenlarsen and @intel/unified-runtime-reviewers
@intel/llvm-gatekeepers this is ready to be merged
@konradkusiak97 - It looks like this may have caused https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Basic/out_of_order_queue_status.cpp to fail on Gen12. Could you please have a look?
@konradkusiak97 - It looks like this may have caused https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Basic/out_of_order_queue_status.cpp to fail on Gen12. Could you please have a look?
I can't reproduce this locally, even with a debug build but I can see the post-commit failures, will investigate further.
@konradkusiak97 - It looks like this may have caused https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Basic/out_of_order_queue_status.cpp to fail on Gen12. Could you please have a look?
I can't reproduce this locally, even with a debug build but I can see the post-commit failures, will investigate further.
Any updates on this? Should we just revert this PR?
@konradkusiak97 - It looks like this may have caused https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Basic/out_of_order_queue_status.cpp to fail on Gen12. Could you please have a look?
I can't reproduce this locally, even with a debug build but I can see the post-commit failures, will investigate further.
Any updates on this? Should we just revert this PR?
I'm still working on the fix so let's revert it for now.