Michael Aziz comments

Results 11 comments of


                                            Michael Aziz

[SYCL] Use PI APIs for cooperative kernels

> Any reason these cooperative kernels cannot be fused? Could we implement fusion of cooperative kernels by creating a new cooperative kernel and that's it? I think the reason this...

[SYCL] Use PI APIs for cooperative kernels

> For Kernel Fusion, we'd need some modifications to the error message [here](https://github.com/victor-eds/llvm/blob/6c042e0b39c78ebdd564bd60bea1f60f84e07486/sycl/source/detail/scheduler/graph_builder.cpp#L1016) to avoid saying we do not support "Kernel" CG type. > > Also, having a test wouldn't...

[SYCL] Use PI APIs for cooperative kernels

``` Failed Tests (8): SYCL :: Assert/assert_in_kernels_win.cpp SYCL :: Assert/assert_in_multiple_tus_one_ndebug_win.cpp SYCL :: Assert/assert_in_multiple_tus_win.cpp SYCL :: Assert/assert_in_one_kernel_win.cpp SYCL :: Assert/assert_in_simultaneous_kernels_win.cpp SYCL :: Assert/assert_in_simultaneously_multiple_tus.cpp SYCL :: Assert/assert_in_simultaneously_multiple_tus_one_ndebug.cpp SYCL :: Plugin/sycl-ls-unified-runtime.cpp ``` These...

Implement L0 cooperative kernel functions

SYCL tests for this PR are passing here: https://github.com/intel/llvm/pull/13653. The current implementation of `urEnqueueCooperativeKernelLaunchExp` is nearly identical to `urEnqueueKernelLaunch`. It has some minor differences and calls `zeCommandListAppendLaunchCooperativeKernel`. I'm not sure...

Clarify semantics of `urKernelSuggestMaxCooperativeGroupCountExp`

@JackAKirk, thanks for pointing this out. I agree that what we have now is confusing. For CUDA, the documentation says the following: > The total number of blocks launched cannot...

Clarify semantics of `urKernelSuggestMaxCooperativeGroupCountExp`

> Sure we could work out the device that has been last used from the CUcontext that is currently set, but is this really the semantics of the query? This...

Compiling with Spack Installed GCC Errors

@kaanolgu, I don't think this is a bug in our build system since you're able to get it working using your system's GCC. Could you please try comparing the `LD_LIBRARY_PATH`...

failed to allocate memory when using malloc_device

@SimonWang9610, the allocation function you're using returns a `nullptr` if there are not enough resources to allocate the requested memory. If you're using the L0 adapter, you can use [this](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_intel_device_info.md#free-global-memory)...

unexpected unqualified-id: align

@jinz2014, this looks like a macro defined for CUDA source files. Could you use the `alignas` specifier for what you're trying to do?

No diagnostic for conflicting kernel names in different source files

I'm able to reproduce this issue locally. Thanks for reporting this.