Michael Aziz
Michael Aziz
> Any reason these cooperative kernels cannot be fused? Could we implement fusion of cooperative kernels by creating a new cooperative kernel and that's it? I think the reason this...
> For Kernel Fusion, we'd need some modifications to the error message [here](https://github.com/victor-eds/llvm/blob/6c042e0b39c78ebdd564bd60bea1f60f84e07486/sycl/source/detail/scheduler/graph_builder.cpp#L1016) to avoid saying we do not support "Kernel" CG type. > > Also, having a test wouldn't...
``` Failed Tests (8): SYCL :: Assert/assert_in_kernels_win.cpp SYCL :: Assert/assert_in_multiple_tus_one_ndebug_win.cpp SYCL :: Assert/assert_in_multiple_tus_win.cpp SYCL :: Assert/assert_in_one_kernel_win.cpp SYCL :: Assert/assert_in_simultaneous_kernels_win.cpp SYCL :: Assert/assert_in_simultaneously_multiple_tus.cpp SYCL :: Assert/assert_in_simultaneously_multiple_tus_one_ndebug.cpp SYCL :: Plugin/sycl-ls-unified-runtime.cpp ``` These...
SYCL tests for this PR are passing here: https://github.com/intel/llvm/pull/13653. The current implementation of `urEnqueueCooperativeKernelLaunchExp` is nearly identical to `urEnqueueKernelLaunch`. It has some minor differences and calls `zeCommandListAppendLaunchCooperativeKernel`. I'm not sure...
@JackAKirk, thanks for pointing this out. I agree that what we have now is confusing. For CUDA, the documentation says the following: > The total number of blocks launched cannot...
> Sure we could work out the device that has been last used from the CUcontext that is currently set, but is this really the semantics of the query? This...
@kaanolgu, I don't think this is a bug in our build system since you're able to get it working using your system's GCC. Could you please try comparing the `LD_LIBRARY_PATH`...
@SimonWang9610, the allocation function you're using returns a `nullptr` if there are not enough resources to allocate the requested memory. If you're using the L0 adapter, you can use [this](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_intel_device_info.md#free-global-memory)...
@jinz2014, this looks like a macro defined for CUDA source files. Could you use the `alignas` specifier for what you're trying to do?
I'm able to reproduce this issue locally. Thanks for reporting this.