llvm icon indicating copy to clipboard operation
llvm copied to clipboard

Application aborted when passing nullptr as a kernel argument on OpenCL backend

Open al42and opened this issue 3 years ago • 6 comments

Describe the bug

Templated code that sometimes passes a local accessor to the kernel and sometimes passes a nullptr fails with a runtime error during kernel setup when running on Intel GPU via OpenCL. Using LevelZero or enabling dead argument optimization during compilation resolves the problem. So does simply using the accessor, without extracting the pointer from it.

To Reproduce

  1. Code snippet: https://gist.github.com/al42and/b2eb3bd19c30fdda11f7551294684c13. Could not make it much shorter. The example is artificial, but such a pattern appears in a larger codebase.
  2. Compile: clang++ -fsycl local_accessors.cpp -o local_accessors
  3. Rub: SYCL_DEVICE_FILTER=opencl:gpu ./local_accessors
  4. Output:
Device name: Intel(R) Iris(R) Xe Graphics [0x9a49]
Running kernel with SLM: 
    Done
Running kernel without SLM: 
Abort was called at 211 line in file:
../../neo/shared/source/kernel/kernel_arg_descriptor.h
Aborted (core dumped)

Adding -fsycl-dead-args-optimization to the compiler arguments or switching to LevelZero backend with SYCL_DEVICE_FILTER=level_zero:gpu makes both tests pass.

Environment:

Tested on two systems:

Laptop:

  • OS: Ubuntu 20.10
  • Target device and vendor: Intel(R) Iris(R) Xe Graphics [0x9a49]
  • DPC++ version: Intel(R) oneAPI DPC++/C++ Compiler 2021.4.0 (2021.4.0.20210924)
  • Dependencies version: compute-runtime 21.45.21574

Dev server:

  • OS: Ubuntu 20.04
  • Target device and vendor: Intel(R) Iris(R) Xe MAX Graphics [0x4905]
  • DPC++ version: clang version 14.0.0 (https://github.com/intel/llvm.git 3205368f67e0c60db08ac51cb26a9e978879d992)
  • Dependencies version: compute-runtime 21.31.20514

al42and avatar Nov 16 '21 20:11 al42and

What if the accessor auto gm_data = buf.get_access<mode::write>(cgh);, line 26, is created in the in the command group scope, outside of the kernel, line 51 in your code before passing it to the kernel?

Michoumichmich avatar Nov 16 '21 20:11 Michoumichmich

What if the accessor auto gm_data = buf.get_access<mode::write>(cgh);, line 26, is created in the in the command group scope, outside of the kernel, line 51 in your code before passing it to the kernel?

You mean something like https://gist.github.com/al42and/8d9cce05bd771ce68953e12a603fe569? Same behavior: aborted with OpenCL and no dead arg optimization.

al42and avatar Nov 16 '21 20:11 al42and

What if the accessor auto gm_data = buf.get_access<mode::write>(cgh);, line 26, is created in the in the command group scope, outside of the kernel, line 51 in your code before passing it to the kernel?

You mean something like https://gist.github.com/al42and/8d9cce05bd771ce68953e12a603fe569? Same behavior: aborted with OpenCL and no dead arg optimization.

Oh yes, my bad, I misread the code, it's the same, (I thought you were taking the accessor in the kernel). I tried on the CPU OpenCL backend as well as on CUDA and FPGA (emulation) and the code works on all of these targets, at least. Maybe you're not allowed to have a null pointer to local memory ? You could maybe return an auto from getLocalPtr() to get a sycl::local_ptr iff using local memory ?

Michoumichmich avatar Nov 16 '21 20:11 Michoumichmich

I guess the SYCL to OpenCL kernel lowering could be improved to handle std::nullptr_t which the type captured when your bool condition is false.

Otherwise your bug seems to be in your opencl driver rather than dpc++ (../../neo/shared/source/kernel/kernel_arg_descriptor.h that's from the opencl driver). My 2 cents is they don't handle nullptr with USM.

Naghasan avatar Nov 17 '21 10:11 Naghasan

There might be some confusion here. std::nullptr_t is a specific type that allows some shortcut at compile time using overload/concept/... But passing a null pointer à la NULL during execution will not be catched as std::nullptr_t and will have to be resolved by the runtime.

keryell avatar Nov 18 '21 20:11 keryell

The problem persists with oneAPI DPC++/C++ Compiler 2022.1.0, compute-runtime 22.29.23750.

Note: It seems -fsycl-dead-args-optimization is now enabled for optimization levels -O2 and -O3, so this bug is only triggered with -O0/-O1 of with explicit -fno-sycl-dead-args-optimization.

al42and avatar Sep 14 '22 16:09 al42and