llvm simple sycl program fails to compile at `-O0`, works at `-O2`

Describe the bug

The attached SYCL program test.cc builds and runs if compiled with -O2, but causes a compiler crash with -O0.

To reproduce

Include a code snippet that is as short as possible

See the file test.cc, attached.

Specify the command which should be used to compile the program

source /opt/intel/oneapi/setvars.sh
icpx -O0 -g -fsycl -fsycl-targets=intel_gpu_tgllp test.cc -o test

Specify the command which should be used to launch the program

./test

Indicate what is wrong and what was expected

The compiler crashes with the error message

$ icpx -O0 -g -fsycl -fsycl-targets=intel_gpu_tgllp test.cc -o test
PLEASE append the compiler options "-save-temps -v", rebuild the application to get the full command which is failing and submit a bug report to https://software.intel.com/en-us/support/priority-support which includes the failing command, input files for the command and the crash backtrace (if any).
Stack dump:
0.      Program arguments: /opt/intel/oneapi/compiler/2024.2/bin/compiler/clang -cc1 -triple spir64_gen-unknown-unknown -aux-triple x86_64-unknown-linux-gnu -fsycl-is-device -fdeclare-spirv-builtins -mllvm -sycl-opt -fsycl-instrument-device-code -Wno-sycl-strict -fsycl-int-header=/tmp/icpx-f6b3135d85/test-header-3c4c23.h -fsycl-int-footer=/tmp/icpx-f6b3135d85/test-footer-2d631b.h -sycl-std=2020 -fno-sycl-force-inline-kernel-lambda -ffine-grained-bitfield-accesses -fsycl-unique-prefix=uidbe51d600402e318e -fsycl-disable-range-rounding -D__SYCL_TARGET_INTEL_GPU_TGLLP__ -D__SYCL_ANY_DEVICE_HAS_ANY_ASPECT__=1 -Wspir-compat -emit-llvm-bc -emit-llvm-uselists -dumpdir test- -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name test.cc -fsycl-use-main-file-name -full-main-file-name test.cc -mrelocation-model static -fveclib=SVML -faltmathlib=SVMLAltMathLibrary -mframe-pointer=all -fapprox-func -funsafe-math-optimizations -fno-signed-zeros -mreassociate -freciprocal-math -ffp-contract=fast -fno-rounding-math -complex-range=promoted -mconstructor-aliases -aux-target-cpu x86-64 -debug-info-kind=limited -dwarf-version=4 -debugger-tuning=gdb -fdebug-compilation-dir=/home/fwyzard/test/sycl_warp_size_v2 -fcoverage-compilation-dir=/home/fwyzard/test/sycl_warp_size_v2 -fclang-abi-compat=17 -resource-dir /opt/intel/oneapi/compiler/2024.2/lib/clang/19 -internal-isystem /opt/intel/oneapi/compiler/2024.2/bin/compiler/../../include/sycl -internal-isystem /opt/intel/oneapi/compiler/2024.2/bin/compiler/../../include/sycl/stl_wrappers -internal-isystem /opt/intel/oneapi/compiler/2024.2/bin/compiler/../../include -I/opt/intel/oneapi/tbb/2021.13/env/../include -I/opt/intel/oneapi/mpi/2021.13/include -I/opt/intel/oneapi/mkl/2024.2/include -I/opt/intel/oneapi/ippcp/2021.12/include -I/opt/intel/oneapi/ipp/2021.12/include -I/opt/intel/oneapi/dpl/2022.6/include -I/opt/intel/oneapi/dpcpp-ct/2024.2/include -I/opt/intel/oneapi/dnnl/2024.2/include -I/opt/intel/oneapi/dev-utilities/2024.2/include -I/opt/intel/oneapi/dal/2024.6/include/dal -I/opt/intel/oneapi/ccl/2021.13/include -internal-isystem /opt/intel/oneapi/compiler/2024.2/bin/compiler/../../opt/compiler/include -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/x86_64-linux-gnu/c++/13 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/backward -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/x86_64-linux-gnu/c++/13 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/backward -internal-isystem /opt/intel/oneapi/compiler/2024.2/lib/clang/19/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../x86_64-linux-gnu/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /opt/intel/oneapi/compiler/2024.2/lib/clang/19/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../x86_64-linux-gnu/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -O0 -std=c++17 -fdeprecated-macro -ferror-limit 19 -fheinous-gnu-extensions -fgpu-rdc -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fcxx-exceptions -fexceptions -fcolor-diagnostics -mllvm -disable-hir-generate-mkl-call -mllvm -intel-abi-compatible=true -dwarf-debug-flags " --driver-mode=g++ --intel -O0 -g -fsycl-targets=intel_gpu_tgllp test.cc -o test -fveclib=SVML -faltmathlib=SVML -fheinous-gnu-extensions -dumpdir test- -march=tgllp" -D__GCC_HAVE_DWARF2_CFI_ASM=1 -fintel-compatibility -fintel-compatibility-disable=FakeLoad -fintel-libirc-allowed -fintel-libimf-allowed -o /tmp/icpx-f6b3135d85/test-tgllp-091b76.bc -x c++ test.cc
1.      <eof> parser at end of file
2.      Optimizer
 #0 0x000063ea6b6b44f3 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x5f294f3)
 #1 0x000063ea6b6b32d0 llvm::sys::RunSignalHandlers() (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x5f282d0)
 #2 0x000063ea6b6b4a5b SignalHandler(int) Signals.cpp:0:0
 #3 0x0000705235e42520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x000063ea6bc7158d (anonymous namespace)::stripToMemorySource(llvm::Value*) MutatePrintfAddrspace.cpp:0:0
 #5 0x000063ea6bc7127c (anonymous namespace)::setFuncCallsOntoCASPrintf(llvm::Function*, llvm::Function*, llvm::SmallVector<llvm::Function*, 8u>&) MutatePrintfAddrspace.cpp:0:0
 #6 0x000063ea6bc70f85 llvm::SYCLMutatePrintfAddrspacePass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x64e5f85)
 #7 0x000063ea6bbc121d llvm::detail::PassModel<llvm::Module, llvm::SYCLMutatePrintfAddrspacePass, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) BackendUtil.cpp:0:0
 #8 0x000063ea699aaffa llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x421fffa)
 #9 0x000063ea6a6be8e0 (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream>>&, std::__1::unique_ptr<llvm::ToolOutputFile, std::__1::default_delete<llvm::ToolOutputFile>>&) BackendUtil.cpp:0:0
#10 0x000063ea69e2409f clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream>>) (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x469909f)
#11 0x000063ea69eac42a clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) CodeGenAction.cpp:0:0
#12 0x000063ea697bbc0d clang::ParseAST(clang::Sema&, bool, bool) (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x4030c0d)
#13 0x000063ea6a51b358 clang::CodeGenAction::ExecuteAction() (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x4d90358)
#14 0x000063ea6a51e3ca clang::FrontendAction::Execute() (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x4d933ca)
#15 0x000063ea6a51df10 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x4d92f10)
#16 0x000063ea6a61d03e clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x4e9203e)
#17 0x000063ea6a3a9e76 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x4c1ee76)
#18 0x000063ea6a6242d5 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#19 0x000063ea6a3ad39a clang_main(int, char**, llvm::ToolContext const&) (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x4c2239a)
#20 0x000063ea6a0288e5 main (/opt/intel/oneapi/compiler/2024.2/bin/compiler/clang+0x489d8e5)
#21 0x0000705235e29d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#22 0x0000705235e29e40 call_init ./csu/../csu/libc-start.c:128:20
#23 0x0000705235e29e40 __libc_start_main ./csu/../csu/libc-start.c:379:5
#24 0x000063ea6ab9fe1a _start /localdisk2/test/toolchain-cross/src/glibc/csu/../sysdeps/x86_64/start.S:122:0
icpx: error: unable to execute command: Segmentation fault (core dumped)
icpx: error: clang frontend command failed due to signal (use -v to see invocation)
Intel(R) oneAPI DPC++/C++ Compiler 2024.2.1 (2024.2.1.20240711)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2024.2/bin/compiler
Configuration file: /opt/intel/oneapi/compiler/2024.2/bin/compiler/../icpx.cfg
icpx: note: diagnostic msg: Error generating preprocessed source(s).

The same program builds and runs fine if compiled with -O2:

$ icpx -O2 -g -fsycl -fsycl-targets=intel_gpu_tgllp test.cc -o test
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.

Environment

OS: Ubuntu Linux 22.04
Target device and vendor: Intel Tiger Lake LP integrated GPU.
DPC++ version: Intel(R) oneAPI DPC++/C++ Compiler 2024.2.1 (2024.2.1.20240711)
Dependencies version:

[opencl:cpu][opencl:0] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz OpenCL 3.0 (Build 0) [2024.18.7.0.11_160000]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) UHD Graphics OpenCL 3.0 NEO  [24.22.29735.27]
[level_zero:gpu][level_zero:0] Intel(R) Level-Zero, Intel(R) UHD Graphics 1.3 [1.3.29735]
[cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3050 Ti Laptop GPU 8.6 [CUDA 12.6]

Platforms: 4
Platform [#1]:
    Version  : OpenCL 3.0 LINUX
    Name     : Intel(R) OpenCL
    Vendor   : Intel(R) Corporation
    Devices  : 1
        Device [#0]:
        Type       : cpu
        Version    : OpenCL 3.0 (Build 0)
        Name       : 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
        Vendor     : Intel(R) Corporation
        Driver     : 2024.18.7.0.11_160000
        Aspects    : cpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_system_allocations usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_oneapi_srgb ext_oneapi_native_assert ext_intel_legacy_image ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group
        info::device::sub_group_sizes: 4 8 16 32 64
Platform [#2]:
    Version  : OpenCL 3.0 
    Name     : Intel(R) OpenCL Graphics
    Vendor   : Intel(R) Corporation
    Devices  : 1
        Device [#1]:
        Type       : gpu
        Version    : OpenCL 3.0 NEO 
        Name       : Intel(R) UHD Graphics
        Vendor     : Intel(R) Corporation
        Driver     : 24.22.29735.27
        Aspects    : gpu fp16 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations atomic64 ext_oneapi_srgb ext_intel_device_id ext_intel_legacy_image ext_intel_esimd ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group
        info::device::sub_group_sizes: 8 16 32
Platform [#3]:
    Version  : 1.3
    Name     : Intel(R) Level-Zero
    Vendor   : Intel(R) Corporation
    Devices  : 1
        Device [#0]:
        Type       : gpu
        Version    : 1.3
        Name       : Intel(R) UHD Graphics
        Vendor     : Intel(R) Corporation
        Driver     : 1.3.29735
        Aspects    : gpu fp16 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address ext_intel_gpu_eu_count ext_intel_gpu_eu_simd_width ext_intel_gpu_slices ext_intel_gpu_subslices_per_slice ext_intel_gpu_eu_count_per_subslice atomic64 ext_intel_device_info_uuid ext_intel_gpu_hw_threads_per_eu ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_intel_legacy_image ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_intel_esimd ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group ext_oneapi_graph
        info::device::sub_group_sizes: 8 16 32
Platform [#4]:
    Version  : CUDA 12.6
    Name     : NVIDIA CUDA BACKEND
    Vendor   : NVIDIA Corporation
    Devices  : 1
        Device [#0]:
        Type       : gpu
        Version    : 8.6
        Name       : NVIDIA GeForce RTX 3050 Ti Laptop GPU
        Vendor     : NVIDIA Corporation
        Driver     : CUDA 12.6
        Aspects    : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_system_allocations ext_intel_pci_address usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_native_assert ext_oneapi_bfloat16_math_functions ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_widthur_print: Images are not fully supported by the CUDA BE, their support is disabled by default. Their partial support can be activated by setting SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT environment variable at runtime.
 ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_interop_memory_import ext_oneapi_interop_semaphore_import ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_oneapi_mipmap_level_reference ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_graph ext_oneapi_cubemap ext_oneapi_cubemap_seamless_filtering
        info::device::sub_group_sizes: 32
default_selector()      : gpu, Intel(R) Level-Zero, Intel(R) UHD Graphics 1.3 [1.3.29735]
accelerator_selector()  : No device of requested type available. Please chec...
cpu_selector()          : cpu, Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz OpenCL 3.0 (Build 0) [2024.18.7.0.11_160000]
gpu_selector()          : gpu, Intel(R) Level-Zero, Intel(R) UHD Graphics 1.3 [1.3.29735]
custom_selector(gpu)    : gpu, Intel(R) Level-Zero, Intel(R) UHD Graphics 1.3 [1.3.29735]
custom_selector(cpu)    : cpu, Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz OpenCL 3.0 (Build 0) [2024.18.7.0.11_160000]
custom_selector(acc)    : No device of requested type available. Please chec...

Additional context

No response

Aug 23 '24 14:08 fwyzard

@ivorobts FYI

Aug 23 '24 14:08 fwyzard

Do you have any workarounds for this?

Mar 16 '25 15:03 cemlyn007

The only workaround I found was to compile with -Og, -O1, or better.

Compiling with -O0 still crashes with Intel(R) oneAPI DPC++/C++ Compiler 2025.0.4 .

Mar 16 '25 17:03 fwyzard

Yeah, I was on 2025.0.1 by the looks of things. I was mainly asking because I wanted VS Code clangd extension (at least when Bazel is used) to stop crying because the file: /opt/intel/oneapi/compiler/2025.0/include/sycl/CL/__spirv/spirv_ops.hpp has an error because -fdeclare-spirv-builtins is not set! I think I'll try to find a way to ignore the error so that I don't have any red squiggles. I guess I could use the -O0 flag with -fdeclare-spirv-builtins when I am generating compile_commands.json. Thank you, hope you're having a good Sunday!

Mar 16 '25 18:03 cemlyn007

One of the things I see in the testcase is this: static const char* __DEVICE_CONSTANT__ format = FORMAT; According to the OpenCL spec, the format string must reside in constant address space. So, we need the code to be: static const char* __DEVICE_CONSTANT__ const format = FORMAT; Making this change to source code helps to resolve the issue.

Please let us know if this resolution is agreeable.

Thanks

Mar 18 '25 15:03 asudarsa

@asudarsa thank you for the suggestion.

If I add the const as you suggest, I get a different error when compiling with -O0:

fwyzard@fool:~/test/sycl_issues_15183$ source /opt/intel/oneapi/setvars.sh
fwyzard@fool:~/test/sycl_issues_15183$ icpx -O0 -g -fsycl -fsycl-targets=intel_gpu_tgllp test.cc -o test
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL

error: Cannot compile a kernel in the SIMD mode specified by intel_reqd_sub_group_size(32)
in kernel: 'typeinfo name for sycl::_V1::event launch<1, do_some_work<32u>, bool*&>(sycl::_V1::queue, sycl::_V1::nd_range<1>, do_some_work<32u>&&, bool*&)::'lambda'(sycl::_V1::handler&)::operator()(sycl::_V1::handler&) const::'lambda'(sycl::_V1::nd_item<1>)'
error: backend compiler failed build.

Build failed with error code: -11
Command was: /usr/bin/ocloc -output /tmp/test-tgllp-894424-45edb1.out -file /tmp/icpx-c1dd8009ef/test-tgllp-fe57ad-12ca31.spv -output_no_suffix -spirv_input -device tgllp -options "-g -cl-opt-disable"
llvm-foreach: 
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
icpx: error: gen compiler command failed with exit code 245 (use -v to see invocation)
Intel(R) oneAPI DPC++/C++ Compiler 2025.0.4 (2025.0.4.20241205)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2025.0/bin/compiler
Configuration file: /opt/intel/oneapi/compiler/2025.0/bin/compiler/../icpx.cfg
icpx: note: diagnostic msg: Error generating preprocessed source(s).

The compilation works fine with -Og/-Os/-O1/-O2/-O3.

Mar 18 '25 15:03 fwyzard

Interesting. I tried with the intel/llvm clang++ compiler and that did seem to work. Let me try with OneAPI compiler.

Thanks

Mar 18 '25 16:03 asudarsa

Is the intel/llvm clang++ compiler better than using the oneapi compiler?

Mar 18 '25 20:03 cemlyn007

Is the intel/llvm clang++ compiler better than using the oneapi compiler?

I used the intel/llvm compiler built using sources and it is expected to be 'newer' than OneAPI compiler. Having said that, we would like the test case to work with both compilers.

Thanks

Mar 18 '25 23:03 asudarsa