Andrey Alekseenko
Andrey Alekseenko
> Function attributes in SYCL are broken and were a mistake, IMO. But, for all their limitations, they allow for optimizations possible in CUDA/HIP, but not otherwise possible with SYCL...
> The question whether we expect problem sizes to be all over the place such that they are different for every application run. For GROMACS, definitely :) We have fixed...
The problem is still present as of 54a67eb2c1cf275cef4d12b56b0b0786db26cbab
My results with 6.2.1 kernel for Arc A770: ``` Platform: Intel(R) OpenCL HD Graphics Device: Intel(R) Graphics [0x56a0] Driver version : 22.49.25018.24 (Linux x64) Compute units : 512 Clock frequency...
Don't have Windows :(
FWIW, a similar problem occurs on RHEL-like systems
Hi @danhoeflinger, > My expectation is that if oneDPL were to support something like this, it would be in the context of [kernel template APIs](https://oneapi-src.github.io/oneDPL/kernel_templates_main.html). If you have a clear...
> I suggest taking a look at the SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR environment variable: [intel.github.io/llvm-docs/EnvironmentVariables.html#debugging-variables-for-level-zero-plugin](https://intel.github.io/llvm-docs/EnvironmentVariables.html#debugging-variables-for-level-zero-plugin) That will not work for other backends, unfortunately; and even for L0 is not a user-friendly solution. But...
> An extension function sycl::event sycl::queue::ext_record_event() that performs the equivalent of a cudaEventRecord on an in-order queue Have you considered using [`sycl_ext_oneapi_enqueue_barrier`](https://github.com/intel/llvm/blob/bd1f685f0799dc90503050b76ce78ebbe43b0163/sycl/doc/extensions/supported/sycl_ext_oneapi_enqueue_barrier.asciidoc)? `sycl::queue::ext_oneapi_submit_barrier()` does exactly that, as far as I...
> Thanks, I've looked into the implementation of ext_oneapi_submit_barrier and it appears to just return the event from the last SYCL submission that was made to the in-order queue. It...