llvm icon indicating copy to clipboard operation
llvm copied to clipboard

clang-22 asserts/crashes when compiling C++20 coroutines in device code

Open mavrogato opened this issue 1 month ago • 1 comments

Describe the bug

SYCLカーネル内でC++20コルーチン (co_returnを使用する関数) をコンパイルしようとすると、コンパイラ(clang-22)が内部アサーションエラー(PHINode::setIncomingValue)によりクラッシュします。カスタムメモリプールアロケータを使用する特定のパターンで発生します。

To reproduce

  1. Include a code snippet that is as short as possible
#include <sycl/sycl.hpp>
#include <iostream>
#include <coroutine>
#include <utility>
#include <cstddef>

// Arena構造体
struct KernelArena {
    char* buffer;
    mutable std::size_t used; // mutable にすることで const ポインタ経由でも変更可能に
    const std::size_t size;

    // const メンバ関数に変更
    void* alloc(std::size_t alloc_size) const { 
        std::size_t old_used = used; 
        if (old_used + alloc_size > size) { return nullptr; }
        used += alloc_size; // mutable なので変更可能
        return buffer + old_used;
    }
};

// ... (task, promise_type の定義を修正) ...
struct task {
    struct promise_type;
    using handle_type = std::coroutine_handle<promise_type>;
    handle_type h;
    task(handle_type h) : h(h) {}
    ~task() { if (h) h.destroy(); }
    task(const task&) = delete;
    task(task&& other) : h(std::exchange(other.h, nullptr)) {}

    struct promise_type {
        task get_return_object() { return task{handle_type::from_promise(*this)}; }
        std::suspend_always initial_suspend() { return {}; }
        std::suspend_always final_suspend() noexcept { return {}; }
        void return_void() {}
        void unhandled_exception() {} 

        // ここを const KernelArena* に変更し、関数シグネチャを合わせる
        static void* operator new(std::size_t size, const KernelArena* arena_ptr) {
            return arena_ptr->alloc(size); // alloc は const メンバ関数になった
        }
    };
};

// SYCLカーネル内で呼び出すコルーチン関数 (変更なし)
task simple_coroutine_func(const KernelArena* arena_ptr) { co_return; }

// --------------------------------------------------
// main関数とSYCLカーネル
// --------------------------------------------------
int main() {
    sycl::queue q;
    std::cout << "Running on device: " << q.get_device().get_info<sycl::info::device::name>() << std::endl;

    constexpr std::size_t ARENA_SIZE = 4096;
    char* arena_buffer = sycl::malloc_device<char>(ARENA_SIZE, q);
    KernelArena arena_host{arena_buffer, 0, ARENA_SIZE}; 

    // --- コマンドグループ1: データのデバイス転送 ---
    q.submit([&](sycl::handler& h) { 
        h.memcpy(&arena_host, &arena_host, sizeof(KernelArena));
    }).wait(); 

    // --- コマンドグループ2: カーネル実行 ---
    q.submit([&](sycl::handler& h) { 
        sycl::stream out(2048, 80, h);
        
        // ここは [=] 値キャプチャのまま
        h.single_task([=]() { 
            if (arena_host.used > arena_host.size) {
                 out << "Arena allocation failed!" << sycl::endl;
            }
            // 呼び出し時にアドレス & を渡す。constポインタとして渡される
            simple_coroutine_func(&arena_host); 
        });
    }).wait(); 

    // --- コマンドグループ3: 結果のホスト転送 ---
    q.memcpy(&arena_host, &arena_host, sizeof(KernelArena)).wait();

    std::cout << "Kernel finished successfully. Arena used: " << arena_host.used << " bytes." << std::endl;

    sycl::free(arena_buffer, q);
    return 0;
}
  1. Specify the command which should be used to compile the program
clang++ -fsycl -fsycl-targets=spir64-unknown-unknown-opencl -O2 -std=gnu++23 main.cc -o main
  1. Specify the command which should be used to launch the program
  2. Indicate what is wrong and what was expected

Environment

  • OS: Linux

  • Target device and vender: NVIDIA GeForce RTX 3060 Laptop GPU

  • DPC++ version: Intel SYCL compiler development build based on: clang version 22.0.0git ([email protected]:intel/llvm.git ccac4e0910c6a3a7dc684254ec46b58192159c23) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /home/mavrogato/sycl_workspace/llvm/build/bin Build config: +assertions

  • Dependencies version: [e.g. the output of sycl-ls --verbose] <LOADER>[INFO]: loaded adapter 0x0x5608371c1a40 (libur_adapter_cuda.so.0) from /home/mavrogato/sycl_workspace/llvm/build/lib/libur_adapter_cuda.so.0 <LOADER>[INFO]: failed to load adapter 'libur_adapter_hip.so.0' with error: libur_adapter_hip.so.0: cannot open shared object file: No such file or directory <LOADER>[INFO]: failed to load adapter '/home/mavrogato/sycl_workspace/llvm/build/lib/libur_adapter_hip.so.0' with error: /home/mavrogato/sycl_workspace/llvm/build/lib/libur_adapter_hip.so.0: cannot open shared object file: No such file or directory <LOADER>[INFO]: loaded adapter 0x0x5608371c75b0 (libur_adapter_level_zero.so.0) from /home/mavrogato/sycl_workspace/llvm/build/lib/libur_adapter_level_zero.so.0 <LOADER>[INFO]: loaded adapter 0x0x5608371c6b50 (libur_adapter_level_zero_v2.so.0) from /home/mavrogato/sycl_workspace/llvm/build/lib/libur_adapter_level_zero_v2.so.0 <LOADER>[INFO]: failed to load adapter 'libur_adapter_native_cpu.so.0' with error: libur_adapter_native_cpu.so.0: cannot open shared object file: No such file or directory <LOADER>[INFO]: failed to load adapter '/home/mavrogato/sycl_workspace/llvm/build/lib/libur_adapter_native_cpu.so.0' with error: /home/mavrogato/sycl_workspace/llvm/build/lib/libur_adapter_native_cpu.so.0: cannot open shared object file: No such file or directory <LOADER>[INFO]: failed to load adapter 'libur_adapter_offload.so.0' with error: libur_adapter_offload.so.0: cannot open shared object file: No such file or directory <LOADER>[INFO]: failed to load adapter '/home/mavrogato/sycl_workspace/llvm/build/lib/libur_adapter_offload.so.0' with error: /home/mavrogato/sycl_workspace/llvm/build/lib/libur_adapter_offload.so.0: cannot open shared object file: No such file or directory <LOADER>[INFO]: loaded adapter 0x0x5608371c9ff0 (libur_adapter_opencl.so.0) from /home/mavrogato/sycl_workspace/llvm/build/lib/libur_adapter_opencl.so.0 [cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3060 Laptop GPU 8.6 [CUDA 12.8]

Platforms: 1 Platform [#1]: Version : CUDA 12.8 Name : NVIDIA CUDA BACKEND Vendor : NVIDIA Corporation Devices : 1 Device [#0]: Type : gpu Version : 8.6 Name : NVIDIA GeForce RTX 3060 Laptop GPU Vendor : NVIDIA Corporation Driver : CUDA 12.8 UUID : 4f373e67-a5df-4ea0-b89a-254de3611997 DeviceID : 0 Num SubDevices : 0 Num SubSubDevices : 0 Aspects : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_native_assert ext_oneapi_cuda_async_barrier ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_widthImages are not fully supported by the CUDA BE, their support is disabled by default. Their partial support can be activated by setting SYCL_UR_CUDA_ENABLE_IMAGE_SUPPORT environment variable at runtime. ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_1d_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_external_memory_import ext_oneapi_external_semaphore_import ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_oneapi_mipmap_level_reference ext_oneapi_fragment ext_oneapi_chunk ext_oneapi_graph ext_oneapi_limited_graph ext_oneapi_cubemap ext_oneapi_cubemap_seamless_filtering ext_oneapi_bindless_sampled_image_fetch_1d_usm ext_oneapi_bindless_sampled_image_fetch_2d_usm ext_oneapi_bindless_sampled_image_fetch_2d ext_oneapi_bindless_sampled_image_fetch_3d ext_oneapi_queue_profiling_tag ext_oneapi_virtual_mem ext_oneapi_image_array ext_oneapi_unique_addressing_per_dim ext_oneapi_bindless_images_sample_2d_usm ext_oneapi_bindless_images_gather ext_intel_current_clock_throttle_reasons<CUDA>[ERROR]: UR NVML ERROR: Value: 3 Description: Not Supported Function: urDeviceGetInfo Source Location: /home/mavrogato/sycl_workspace/llvm/unified-runtime/source/adapters/cuda/device.cpp:1143

ext_intel_power_limits ext_oneapi_async_memory_alloc info::device::sub_group_sizes: 32 Architecture: nvidia_gpu_sm_86 default_selector() : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3060 Laptop GPU 8.6 [CUDA 12.8] accelerator_selector() : No device of requested type available. cpu_selector() : No device of requested type available. gpu_selector() : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3060 Laptop GPU 8.6 [CUDA 12.8] custom_selector(gpu) : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3060 Laptop GPU 8.6 [CUDA 12.8] custom_selector(cpu) : No device of requested type available. custom_selector(acc) : No device of requested type available.

Additional context

[ 8%] Building CXX object CMakeFiles/vulkan-sycl-display-test.dir/aux/test/tuple-support-test.cc.o [ 16%] Linking CXX executable vulkan-sycl-display-test [ 16%] Built target vulkan-sycl-display-test [ 25%] Generating zwp-tablet-v2-private.c [ 33%] Generating xdg-shell-private.c [ 41%] Generating zwp-linux-dmabuf-v1-private.c [ 50%] Building CXX object CMakeFiles/host-utils.dir/wayland-coroutines.cc.o [ 58%] Building C object CMakeFiles/host-utils.dir/xdg-shell-private.c.o [ 66%] Building C object CMakeFiles/host-utils.dir/zwp-tablet-v2-private.c.o [ 75%] Building C object CMakeFiles/host-utils.dir/zwp-linux-dmabuf-v1-private.c.o [ 83%] Linking CXX static library libhost-utils.a [ 83%] Built target host-utils [ 91%] Building CXX object CMakeFiles/vulkan-sycl-display.dir/main.cc.o clang-22: /home/mavrogato/sycl_workspace/llvm/llvm/include/llvm/IR/Instructions.h:2719: void llvm::PHINode::setIncomingValue(unsigned int, llvm::Value*): Assertion `getType() == V->getType() && "All operands to PHI node must be the same type as the PHI node!"' failed. PLEASE submit a bug report to https://github.com/intel/llvm/issues and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: /home/mavrogato/sycl_workspace/llvm/build/bin/clang-22 -cc1 -triple spir64-unknown-unknown-opencl -aux-triple x86_64-unknown-linux-gnu -fsycl-is-device -fdeclare-spirv-builtins -O2 -mllvm -sycl-opt -fenable-sycl-dae -Wno-sycl-strict -fsycl-int-header=/tmp/main-header-cab251.h -fsycl-int-footer=/tmp/main-footer-00bf4b.h -sycl-std=2020 -ffine-grained-bitfield-accesses -fsycl-unique-prefix=uid061b524e15e0a77f -D__SYCL_ANY_DEVICE_HAS_ANY_ASPECT__=1 -Wspir-compat -fno-offload-use-alloca-addrspace-for-srets -emit-llvm-bc -emit-llvm-uselists -disable-free -clear-ast-before-backend -main-file-name main.cc -mrelocation-model static -mframe-pointer=all -fmath-errno -ffp-contract=on -fno-rounding-math -mconstructor-aliases -aux-target-cpu x86-64 -debugger-tuning=gdb -fdebug-compilation-dir=/home/mavrogato/work/2025/vulkan-sycl-display/build -fcoverage-compilation-dir=/home/mavrogato/work/2025/vulkan-sycl-display/build -resource-dir /home/mavrogato/sycl_workspace/llvm/build/lib/clang/22 -dependency-file CMakeFiles/vulkan-sycl-display.dir/main.cc.o.d -MT CMakeFiles/vulkan-sycl-display.dir/main.cc.o -sys-header-deps -internal-isystem /home/mavrogato/sycl_workspace/llvm/build/bin/../include/sycl/stl_wrappers -internal-isystem /home/mavrogato/sycl_workspace/llvm/build/bin/../include -internal-isystem /home/mavrogato/sycl_workspace/llvm/build/bin/../include/sycl/stl_wrappers -internal-isystem /home/mavrogato/sycl_workspace/llvm/build/bin/../include -I /home/mavrogato/work/2025/vulkan-sycl-display/build -I /home/mavrogato/work/2025/vulkan-sycl-display -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/14/../../../../include/c++/14 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/14/../../../../include/x86_64-linux-gnu/c++/14 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/14/../../../../include/c++/14/backward -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/14/../../../../include/c++/14 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/14/../../../../include/x86_64-linux-gnu/c++/14 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/14/../../../../include/c++/14/backward -internal-isystem /home/mavrogato/sycl_workspace/llvm/build/lib/clang/22/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/14/../../../../x86_64-linux-gnu/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /home/mavrogato/sycl_workspace/llvm/build/lib/clang/22/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/14/../../../../x86_64-linux-gnu/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -Wall -Wextra -std=gnu++23 -fdeprecated-macro -ferror-limit 19 -fmessage-length=115 -fgpu-rdc -fgnuc-version=4.2.1 -fno-implicit-modules -fskip-odr-check-in-gmf -fcxx-exceptions -fexceptions -faddrsig -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o /tmp/main-7b9700.bc -x c++ /home/mavrogato/work/2025/vulkan-sycl-display/main.cc

  1. parser at end of file
  2. Per-file LLVM IR generation
  3. /home/mavrogato/work/2025/vulkan-sycl-display/main.cc:47:6: Generating code for declaration 'simple_coroutine_func' Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var LLVM_SYMBOLIZER_PATH to point to it): 0 clang-22 0x00005f35904b3c82 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 66 1 clang-22 0x00005f35904b0e9a llvm::sys::RunSignalHandlers() + 58 2 clang-22 0x00005f35904b104c 3 libc.so.6 0x000079a642445330 4 libc.so.6 0x000079a64249eb2c pthread_kill + 284 5 libc.so.6 0x000079a64244527e gsignal + 30 6 libc.so.6 0x000079a6424288ff abort + 223 7 libc.so.6 0x000079a64242881b 8 libc.so.6 0x000079a64243b517 9 clang-22 0x00005f3590d7e30c clang::CodeGen::CodeGenFunction::EmitCoroutineBody(clang::CoroutineBodyStmt const&) + 8684 10 clang-22 0x00005f3590a3f14b clang::CodeGen::CodeGenFunction::GenerateCode(clang::GlobalDecl, llvm::Function*, clang::CodeGen::CGFunctionInfo const&) + 1499 11 clang-22 0x00005f3590aabc5e clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition(clang::GlobalDecl, llvm::GlobalValue*) + 478 12 clang-22 0x00005f3590aa61d5 clang::CodeGen::CodeGenModule::EmitGlobalDefinition(clang::GlobalDecl, llvm::GlobalValue*) + 581 13 clang-22 0x00005f3590ab2ab6 clang::CodeGen::CodeGenModule::EmitDeferred() + 342 14 clang-22 0x00005f3590ab2ae3 clang::CodeGen::CodeGenModule::EmitDeferred() + 387 15 clang-22 0x00005f3590ab2ae3 clang::CodeGen::CodeGenModule::EmitDeferred() + 387 16 clang-22 0x00005f3590ab4208 clang::CodeGen::CodeGenModule::Release() + 120 17 clang-22 0x00005f3590e388e1 18 clang-22 0x00005f3590e32b0b clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) + 1659 19 clang-22 0x00005f3592bd373c clang::ParseAST(clang::Sema&, bool, bool) + 1276 20 clang-22 0x00005f3590e3606c clang::CodeGenAction::ExecuteAction() + 2668 21 clang-22 0x00005f35911c9477 clang::FrontendAction::Execute() + 55 22 clang-22 0x00005f3591148aad clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 1613 23 clang-22 0x00005f35912c2bbe clang::ExecuteCompilerInvocation(clang::CompilerInstance*) + 478 24 clang-22 0x00005f358ebf53d9 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) + 8569 25 clang-22 0x00005f358ebea517 26 clang-22 0x00005f358ebef1bf clang_main(int, char**, llvm::ToolContext const&) + 4751 27 clang-22 0x00005f358eadd24a main + 106 28 libc.so.6 0x000079a64242a1ca 29 libc.so.6 0x000079a64242a28b __libc_start_main + 139 30 clang-22 0x00005f358ebe9b25 _start + 37 clang++: error: unable to execute command: Aborted (core dumped) clang++: error: clang frontend command failed due to signal (use -v to see invocation) Intel SYCL compiler development build based on: clang version 22.0.0git ([email protected]:intel/llvm.git ccac4e0910c6a3a7dc684254ec46b58192159c23) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /home/mavrogato/sycl_workspace/llvm/build/bin Build config: +assertions clang++: note: diagnostic msg: Error generating preprocessed source(s). make[3]: *** [CMakeFiles/vulkan-sycl-display.dir/build.make:76: CMakeFiles/vulkan-sycl-display.dir/main.cc.o] Error 1 make[2]: *** [CMakeFiles/Makefile2:114: CMakeFiles/vulkan-sycl-display.dir/all] Error 2 make[1]: *** [CMakeFiles/Makefile2:174: CMakeFiles/run.dir/rule] Error 2 make: *** [Makefile:173: run] Error 2

main-9a1f70.sh

mavrogato avatar Nov 17 '25 16:11 mavrogato

Hi @mavrogato, thanks for the report.

Even through I agree that the compiler shouldn't crash, I doubt that this program will be considered a valid SYCL program. I.e. we will fix the crash, but there will likely be some error saying that you can't use coroutines in SYCL kernels (note: I'm not familiar with the feature at all, so maybe this specific example should just work with co_return behaving as a regular return)

AlexeySachkov avatar Nov 18 '25 13:11 AlexeySachkov