simple sycl program fails to compile at `-O0`, works at `-O2`
Describe the bug
The attached SYCL program test.cc builds and runs if compiled with -O2, but causes a compiler crash with -O0.
To reproduce
- Include a code snippet that is as short as possible
See the file test.cc, attached.
- Specify the command which should be used to compile the program
source /opt/intel/oneapi/setvars.sh
icpx -O0 -g -fsycl -fsycl-targets=intel_gpu_tgllp test.cc -o test
- Specify the command which should be used to launch the program
./test
- Indicate what is wrong and what was expected
The compiler crashes with the error message
$ icpx -O0 -g -fsycl -fsycl-targets=intel_gpu_tgllp test.cc -o test
PLEASE append the compiler options "-save-temps -v", rebuild the application to get the full command which is failing and submit a bug report to https://software.intel.com/en-us/support/priority-support which includes the failing command, input files for the command and the crash backtrace (if any).
Stack dump:
0. Program arguments: /opt/intel/oneapi/compiler/2024.2/bin/compiler/clang -cc1 -triple spir64_gen-unknown-unknown -aux-triple x86_64-unknown-linux-gnu -fsycl-is-device -fdeclare-spirv-builtins -mllvm -sycl-opt -fsycl-instrument-device-code -Wno-sycl-strict -fsycl-int-header=/tmp/icpx-f6b3135d85/test-header-3c4c23.h -fsycl-int-footer=/tmp/icpx-f6b3135d85/test-footer-2d631b.h -sycl-std=2020 -fno-sycl-force-inline-kernel-lambda -ffine-grained-bitfield-accesses -fsycl-unique-prefix=uidbe51d600402e318e -fsycl-disable-range-rounding -D__SYCL_TARGET_INTEL_GPU_TGLLP__ -D__SYCL_ANY_DEVICE_HAS_ANY_ASPECT__=1 -Wspir-compat -emit-llvm-bc -emit-llvm-uselists -dumpdir test- -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name test.cc -fsycl-use-main-file-name -full-main-file-name test.cc -mrelocation-model static -fveclib=SVML -faltmathlib=SVMLAltMathLibrary -mframe-pointer=all -fapprox-func -funsafe-math-optimizations -fno-signed-zeros -mreassociate -freciprocal-math -ffp-contract=fast -fno-rounding-math -complex-range=promoted -mconstructor-aliases -aux-target-cpu x86-64 -debug-info-kind=limited -dwarf-version=4 -debugger-tuning=gdb -fdebug-compilation-dir=/home/fwyzard/test/sycl_warp_size_v2 -fcoverage-compilation-dir=/home/fwyzard/test/sycl_warp_size_v2 -fclang-abi-compat=17 -resource-dir /opt/intel/oneapi/compiler/2024.2/lib/clang/19 -internal-isystem /opt/intel/oneapi/compiler/2024.2/bin/compiler/../../include/sycl -internal-isystem /opt/intel/oneapi/compiler/2024.2/bin/compiler/../../include/sycl/stl_wrappers -internal-isystem /opt/intel/oneapi/compiler/2024.2/bin/compiler/../../include -I/opt/intel/oneapi/tbb/2021.13/env/../include -I/opt/intel/oneapi/mpi/2021.13/include -I/opt/intel/oneapi/mkl/2024.2/include -I/opt/intel/oneapi/ippcp/2021.12/include -I/opt/intel/oneapi/ipp/2021.12/include -I/opt/intel/oneapi/dpl/2022.6/include -I/opt/intel/oneapi/dpcpp-ct/2024.2/include -I/opt/intel/oneapi/dnnl/2024.2/include -I/opt/intel/oneapi/dev-utilities/2024.2/include -I/opt/intel/oneapi/dal/2024.6/include/dal -I/opt/intel/oneapi/ccl/2021.13/include -internal-isystem /opt/intel/oneapi/compiler/2024.2/bin/compiler/../../opt/compiler/include -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/x86_64-linux-gnu/c++/13 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/backward -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/x86_64-linux-gnu/c++/13 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/backward -internal-isystem /opt/intel/oneapi/compiler/2024.2/lib/clang/19/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../x86_64-linux-gnu/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /opt/intel/oneapi/compiler/2024.2/lib/clang/19/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../x86_64-linux-gnu/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -O0 -std=c++17 -fdeprecated-macro -ferror-limit 19 -fheinous-gnu-extensions -fgpu-rdc -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fcxx-exceptions -fexceptions -fcolor-diagnostics -mllvm -disable-hir-generate-mkl-call -mllvm -intel-abi-compatible=true -dwarf-debug-flags " --driver-mode=g++ --intel -O0 -g -fsycl-targets=intel_gpu_tgllp test.cc -o test -fveclib=SVML -faltmathlib=SVML -fheinous-gnu-extensions -dumpdir test- -march=tgllp" -D__GCC_HAVE_DWARF2_CFI_ASM=1 -fintel-compatibility -fintel-compatibility-disable=FakeLoad -fintel-libirc-allowed -fintel-libimf-allowed -o /tmp/icpx-f6b3135d85/test-tgllp-091b76.bc -x c++ test.cc
1. <eof> parser at end of file
2. Optimizer
#0 0x000063ea6b6b44f3 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x5f294f3)
#1 0x000063ea6b6b32d0 llvm::sys::RunSignalHandlers() (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x5f282d0)
#2 0x000063ea6b6b4a5b SignalHandler(int) Signals.cpp:0:0
#3 0x0000705235e42520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
#4 0x000063ea6bc7158d (anonymous namespace)::stripToMemorySource(llvm::Value*) MutatePrintfAddrspace.cpp:0:0
#5 0x000063ea6bc7127c (anonymous namespace)::setFuncCallsOntoCASPrintf(llvm::Function*, llvm::Function*, llvm::SmallVector<llvm::Function*, 8u>&) MutatePrintfAddrspace.cpp:0:0
#6 0x000063ea6bc70f85 llvm::SYCLMutatePrintfAddrspacePass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x64e5f85)
#7 0x000063ea6bbc121d llvm::detail::PassModel<llvm::Module, llvm::SYCLMutatePrintfAddrspacePass, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) BackendUtil.cpp:0:0
#8 0x000063ea699aaffa llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x421fffa)
#9 0x000063ea6a6be8e0 (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream>>&, std::__1::unique_ptr<llvm::ToolOutputFile, std::__1::default_delete<llvm::ToolOutputFile>>&) BackendUtil.cpp:0:0
#10 0x000063ea69e2409f clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream>>) (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x469909f)
#11 0x000063ea69eac42a clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) CodeGenAction.cpp:0:0
#12 0x000063ea697bbc0d clang::ParseAST(clang::Sema&, bool, bool) (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x4030c0d)
#13 0x000063ea6a51b358 clang::CodeGenAction::ExecuteAction() (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x4d90358)
#14 0x000063ea6a51e3ca clang::FrontendAction::Execute() (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x4d933ca)
#15 0x000063ea6a51df10 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x4d92f10)
#16 0x000063ea6a61d03e clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x4e9203e)
#17 0x000063ea6a3a9e76 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x4c1ee76)
#18 0x000063ea6a6242d5 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#19 0x000063ea6a3ad39a clang_main(int, char**, llvm::ToolContext const&) (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x4c2239a)
#20 0x000063ea6a0288e5 main (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x489d8e5)
#21 0x0000705235e29d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#22 0x0000705235e29e40 call_init ./csu/../csu/libc-start.c:128:20
#23 0x0000705235e29e40 __libc_start_main ./csu/../csu/libc-start.c:379:5
#24 0x000063ea6ab9fe1a _start /localdisk2/test/toolchain-cross/src/glibc/csu/../sysdeps/x86_64/start.S:122:0
icpx: error: unable to execute command: Segmentation fault (core dumped)
icpx: error: clang frontend command failed due to signal (use -v to see invocation)
Intel(R) oneAPI DPC++/C++ Compiler 2024.2.1 (2024.2.1.20240711)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2024.2/bin/compiler
Configuration file: /opt/intel/oneapi/compiler/2024.2/bin/compiler/../icpx.cfg
icpx: note: diagnostic msg: Error generating preprocessed source(s).
The same program builds and runs fine if compiled with -O2:
$ icpx -O2 -g -fsycl -fsycl-targets=intel_gpu_tgllp test.cc -o test
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Environment
- OS: Ubuntu Linux 22.04
- Target device and vendor: Intel Tiger Lake LP integrated GPU.
- DPC++ version: Intel(R) oneAPI DPC++/C++ Compiler 2024.2.1 (2024.2.1.20240711)
- Dependencies version:
[opencl:cpu][opencl:0] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz OpenCL 3.0 (Build 0) [2024.18.7.0.11_160000]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) UHD Graphics OpenCL 3.0 NEO [24.22.29735.27]
[level_zero:gpu][level_zero:0] Intel(R) Level-Zero, Intel(R) UHD Graphics 1.3 [1.3.29735]
[cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3050 Ti Laptop GPU 8.6 [CUDA 12.6]
Platforms: 4
Platform [#1]:
Version : OpenCL 3.0 LINUX
Name : Intel(R) OpenCL
Vendor : Intel(R) Corporation
Devices : 1
Device [#0]:
Type : cpu
Version : OpenCL 3.0 (Build 0)
Name : 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
Vendor : Intel(R) Corporation
Driver : 2024.18.7.0.11_160000
Aspects : cpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_system_allocations usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_oneapi_srgb ext_oneapi_native_assert ext_intel_legacy_image ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group
info::device::sub_group_sizes: 4 8 16 32 64
Platform [#2]:
Version : OpenCL 3.0
Name : Intel(R) OpenCL Graphics
Vendor : Intel(R) Corporation
Devices : 1
Device [#1]:
Type : gpu
Version : OpenCL 3.0 NEO
Name : Intel(R) UHD Graphics
Vendor : Intel(R) Corporation
Driver : 24.22.29735.27
Aspects : gpu fp16 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations atomic64 ext_oneapi_srgb ext_intel_device_id ext_intel_legacy_image ext_intel_esimd ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group
info::device::sub_group_sizes: 8 16 32
Platform [#3]:
Version : 1.3
Name : Intel(R) Level-Zero
Vendor : Intel(R) Corporation
Devices : 1
Device [#0]:
Type : gpu
Version : 1.3
Name : Intel(R) UHD Graphics
Vendor : Intel(R) Corporation
Driver : 1.3.29735
Aspects : gpu fp16 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address ext_intel_gpu_eu_count ext_intel_gpu_eu_simd_width ext_intel_gpu_slices ext_intel_gpu_subslices_per_slice ext_intel_gpu_eu_count_per_subslice atomic64 ext_intel_device_info_uuid ext_intel_gpu_hw_threads_per_eu ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_intel_legacy_image ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_intel_esimd ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group ext_oneapi_graph
info::device::sub_group_sizes: 8 16 32
Platform [#4]:
Version : CUDA 12.6
Name : NVIDIA CUDA BACKEND
Vendor : NVIDIA Corporation
Devices : 1
Device [#0]:
Type : gpu
Version : 8.6
Name : NVIDIA GeForce RTX 3050 Ti Laptop GPU
Vendor : NVIDIA Corporation
Driver : CUDA 12.6
Aspects : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_system_allocations ext_intel_pci_address usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_native_assert ext_oneapi_bfloat16_math_functions ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_widthur_print: Images are not fully supported by the CUDA BE, their support is disabled by default. Their partial support can be activated by setting SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT environment variable at runtime.
ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_interop_memory_import ext_oneapi_interop_semaphore_import ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_oneapi_mipmap_level_reference ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_graph ext_oneapi_cubemap ext_oneapi_cubemap_seamless_filtering
info::device::sub_group_sizes: 32
default_selector() : gpu, Intel(R) Level-Zero, Intel(R) UHD Graphics 1.3 [1.3.29735]
accelerator_selector() : No device of requested type available. Please chec...
cpu_selector() : cpu, Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz OpenCL 3.0 (Build 0) [2024.18.7.0.11_160000]
gpu_selector() : gpu, Intel(R) Level-Zero, Intel(R) UHD Graphics 1.3 [1.3.29735]
custom_selector(gpu) : gpu, Intel(R) Level-Zero, Intel(R) UHD Graphics 1.3 [1.3.29735]
custom_selector(cpu) : cpu, Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz OpenCL 3.0 (Build 0) [2024.18.7.0.11_160000]
custom_selector(acc) : No device of requested type available. Please chec...
Additional context
No response
@ivorobts FYI
Do you have any workarounds for this?
The only workaround I found was to compile with -Og, -O1, or better.
Compiling with -O0 still crashes with Intel(R) oneAPI DPC++/C++ Compiler 2025.0.4 .
Yeah, I was on 2025.0.1 by the looks of things. I was mainly asking because I wanted VS Code clangd extension (at least when Bazel is used) to stop crying because the file: /opt/intel/oneapi/compiler/2025.0/include/sycl/CL/__spirv/spirv_ops.hpp has an error because -fdeclare-spirv-builtins is not set! I think I'll try to find a way to ignore the error so that I don't have any red squiggles. I guess I could use the -O0 flag with -fdeclare-spirv-builtins when I am generating compile_commands.json. Thank you, hope you're having a good Sunday!
One of the things I see in the testcase is this:
static const char* __DEVICE_CONSTANT__ format = FORMAT;
According to the OpenCL spec, the format string must reside in constant address space. So, we need the code to be:
static const char* __DEVICE_CONSTANT__ const format = FORMAT;
Making this change to source code helps to resolve the issue.
Please let us know if this resolution is agreeable.
Thanks
@asudarsa thank you for the suggestion.
If I add the const as you suggest, I get a different error when compiling with -O0:
fwyzard@fool:~/test/sycl_issues_15183$ source /opt/intel/oneapi/setvars.sh
fwyzard@fool:~/test/sycl_issues_15183$ icpx -O0 -g -fsycl -fsycl-targets=intel_gpu_tgllp test.cc -o test
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
error: Cannot compile a kernel in the SIMD mode specified by intel_reqd_sub_group_size(32)
in kernel: 'typeinfo name for sycl::_V1::event launch<1, do_some_work<32u>, bool*&>(sycl::_V1::queue, sycl::_V1::nd_range<1>, do_some_work<32u>&&, bool*&)::'lambda'(sycl::_V1::handler&)::operator()(sycl::_V1::handler&) const::'lambda'(sycl::_V1::nd_item<1>)'
error: backend compiler failed build.
Build failed with error code: -11
Command was: /usr/bin/ocloc -output /tmp/test-tgllp-894424-45edb1.out -file /tmp/icpx-c1dd8009ef/test-tgllp-fe57ad-12ca31.spv -output_no_suffix -spirv_input -device tgllp -options "-g -cl-opt-disable"
llvm-foreach:
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
icpx: error: gen compiler command failed with exit code 245 (use -v to see invocation)
Intel(R) oneAPI DPC++/C++ Compiler 2025.0.4 (2025.0.4.20241205)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2025.0/bin/compiler
Configuration file: /opt/intel/oneapi/compiler/2025.0/bin/compiler/../icpx.cfg
icpx: note: diagnostic msg: Error generating preprocessed source(s).
The compilation works fine with -Og/-Os/-O1/-O2/-O3.
Interesting. I tried with the intel/llvm clang++ compiler and that did seem to work. Let me try with OneAPI compiler.
Thanks
Is the intel/llvm clang++ compiler better than using the oneapi compiler?
Is the intel/llvm clang++ compiler better than using the oneapi compiler?
I used the intel/llvm compiler built using sources and it is expected to be 'newer' than OneAPI compiler. Having said that, we would like the test case to work with both compilers.
Thanks