[SYCL][UR] Implement sycl_ext_oneapi_device_wait
This commit implements the UR functionality for device-wide synchronization and the SYCL APIs using it. The latter implements the sycl_ext_oneapi_device_wait extension.
Probably more of a question to the spec but also would affect tests so I'll ask here. Is the behavior well defined
- In the presence of L0 interop
- With tasks running on sub/parent devices when waiting on parent/sub ?
Probably more of a question to the spec but also would affect tests so I'll ask here. Is the behavior well defined
- In the presence of L0 interop
- With tasks running on sub/parent devices when waiting on parent/sub ?
Tag @gmlueck.
I am not certain there is any need to make note for L0 interop, but I agree that parent-/sub-device synchronization may be useful to have explicit behavior documented for.
I am not certain there is any need to make note for L0 interop, but I agree that parent-/sub-device synchronization may be useful to have explicit behavior documented for.
This is a good point. The users asking for this feature want parity with CUDA, but CUDA has no concept of sub-devices or parent devices. I think it would make sense for this API to wait only for commands submitted to the specific device (not for commands submitted to the parent or sub- devices). However, we also need to consider what can be implemented in the backend.
Do we know how the Level Zero API behaves w.r.t. sub-devices / parent devices?
@intel/llvm-reviewers-runtime & @intel/dpcpp-tools-reviewers - Linux drivers now support the device-wide synchronization, so this patch is ready for review! 🥳