llvm
llvm copied to clipboard
Application aborted when passing nullptr as a kernel argument on OpenCL backend
Describe the bug
Templated code that sometimes passes a local accessor to the kernel and sometimes passes a nullptr fails with a runtime error during kernel setup when running on Intel GPU via OpenCL. Using LevelZero or enabling dead argument optimization during compilation resolves the problem. So does simply using the accessor, without extracting the pointer from it.
To Reproduce
- Code snippet: https://gist.github.com/al42and/b2eb3bd19c30fdda11f7551294684c13. Could not make it much shorter. The example is artificial, but such a pattern appears in a larger codebase.
- Compile:
clang++ -fsycl local_accessors.cpp -o local_accessors - Rub:
SYCL_DEVICE_FILTER=opencl:gpu ./local_accessors - Output:
Device name: Intel(R) Iris(R) Xe Graphics [0x9a49]
Running kernel with SLM:
Done
Running kernel without SLM:
Abort was called at 211 line in file:
../../neo/shared/source/kernel/kernel_arg_descriptor.h
Aborted (core dumped)
Adding -fsycl-dead-args-optimization to the compiler arguments or switching to LevelZero backend with SYCL_DEVICE_FILTER=level_zero:gpu makes both tests pass.
Environment:
Tested on two systems:
Laptop:
- OS: Ubuntu 20.10
- Target device and vendor: Intel(R) Iris(R) Xe Graphics [0x9a49]
- DPC++ version: Intel(R) oneAPI DPC++/C++ Compiler 2021.4.0 (2021.4.0.20210924)
- Dependencies version: compute-runtime 21.45.21574
Dev server:
- OS: Ubuntu 20.04
- Target device and vendor: Intel(R) Iris(R) Xe MAX Graphics [0x4905]
- DPC++ version: clang version 14.0.0 (https://github.com/intel/llvm.git 3205368f67e0c60db08ac51cb26a9e978879d992)
- Dependencies version: compute-runtime 21.31.20514
What if the accessor auto gm_data = buf.get_access<mode::write>(cgh);, line 26, is created in the in the command group scope, outside of the kernel, line 51 in your code before passing it to the kernel?
What if the accessor
auto gm_data = buf.get_access<mode::write>(cgh);, line 26, is created in the in the command group scope, outside of the kernel, line 51 in your code before passing it to the kernel?
You mean something like https://gist.github.com/al42and/8d9cce05bd771ce68953e12a603fe569? Same behavior: aborted with OpenCL and no dead arg optimization.
What if the accessor
auto gm_data = buf.get_access<mode::write>(cgh);, line 26, is created in the in the command group scope, outside of the kernel, line 51 in your code before passing it to the kernel?You mean something like https://gist.github.com/al42and/8d9cce05bd771ce68953e12a603fe569? Same behavior: aborted with OpenCL and no dead arg optimization.
Oh yes, my bad, I misread the code, it's the same, (I thought you were taking the accessor in the kernel). I tried on the CPU OpenCL backend as well as on CUDA and FPGA (emulation) and the code works on all of these targets, at least. Maybe you're not allowed to have a null pointer to local memory ? You could maybe return an auto from getLocalPtr() to get a sycl::local_ptr iff using local memory ?
I guess the SYCL to OpenCL kernel lowering could be improved to handle std::nullptr_t which the type captured when your bool condition is false.
Otherwise your bug seems to be in your opencl driver rather than dpc++ (../../neo/shared/source/kernel/kernel_arg_descriptor.h that's from the opencl driver). My 2 cents is they don't handle nullptr with USM.
There might be some confusion here.
std::nullptr_t is a specific type that allows some shortcut at compile time using overload/concept/...
But passing a null pointer à la NULL during execution will not be catched as std::nullptr_t and will have to be resolved by the runtime.
The problem persists with oneAPI DPC++/C++ Compiler 2022.1.0, compute-runtime 22.29.23750.
Note: It seems -fsycl-dead-args-optimization is now enabled for optimization levels -O2 and -O3, so this bug is only triggered with -O0/-O1 of with explicit -fno-sycl-dead-args-optimization.