nvcc version do not support C++20 std::source_location completely
python -m minisgl --model "../../Qwen/Qwen3-0.6B"
` [2025-12-20|08:57:37] INFO Parsed arguments: ServerArgs(model_path='../../Qwen/Qwen3-0.6B', tp_info=DistributedInfo(rank=0, size=1), dtype=torch.bfloat16, max_running_req=256, attention_backend='auto', cuda_graph_bs=None, cuda_graph_max_bs=None, page_size=1, memory_ratio=0.9, distributed_timeout=60.0, use_dummy_weight=False, use_pynccl=True, max_seq_len_override=None, num_page_override=None, max_extend_tokens=8192, cache_type='radix', offline_mode=False, _unique_suffix='.pid=2657', server_host='127.0.0.1', server_port=1919, num_tokenizer=0, silent_output=False) [2025-12-20|08:57:40|initializer] INFO Tokenize server 0 is ready [2025-12-20|08:57:41|core|rank=0] INFO Free memory before loading model: 78.82 GiB [2025-12-20|08:57:41|core|rank=0] INFO Allocating 650122 pages for KV cache, K + V = 69.44 GiB [2025-12-20|08:57:41|core|rank=0] INFO Auto-selected attention backend: fi [2025-12-20|08:57:41|core|rank=0] INFO Free memory after initialization: 7.71 GiB [2025-12-20|08:57:41|core|rank=0] INFO Start capturing CUDA graphs with sizes: [160, 152, 144, 136, 128, 120, 112, 104, 96, 88, 80, 72, 64, 56, 48, 40, 32, 24, 16, 8, 4, 2, 1] [2025-12-20|08:57:41|core|rank=0] INFO Free GPU memory before capturing CUDA graphs: 7.64 GiB Process minisgl-TP0-scheduler: Traceback (most recent call last): File "/root/miniconda3/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/root/miniconda3/lib/python3.12/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/root/autodl-tmp/mini-sglang/python/minisgl/server/launch.py", line 21, in _run_scheduler scheduler = Scheduler(args) ^^^^^^^^^^^^^^^ File "/root/autodl-tmp/mini-sglang/python/minisgl/scheduler/scheduler.py", line 84, in init self.engine = Engine(config) ^^^^^^^^^^^^^^ File "/root/autodl-tmp/mini-sglang/python/minisgl/engine/engine.py", line 101, in init self.graph_runner = GraphRunner( ^^^^^^^^^^^^ File "/root/autodl-tmp/mini-sglang/python/minisgl/engine/graph.py", line 96, in init self.logits[:] = model.forward() ^^^^^^^^^^^^^^^ File "/root/autodl-tmp/mini-sglang/python/minisgl/models/qwen3.py", line 82, in forward output = self.model.forward(ctx.batch.input_ids) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/mini-sglang/python/minisgl/models/qwen3.py", line 61, in forward x = self.embed_tokens.forward(input_ids) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/mini-sglang/python/minisgl/layers/embedding.py", line 35, in forward y = indexing( ^^^^^^^^^ File "/root/autodl-tmp/mini-sglang/python/minisgl/kernel/index.py", line 48, in indexing module = _jit_index_module(element_size, num_splits=num_splits) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/mini-sglang/python/minisgl/kernel/index.py", line 23, in _jit_index_module return load_jit( ^^^^^^^^^ File "/root/autodl-tmp/mini-sglang/python/minisgl/kernel/utils.py", line 120, in load_jit return load_inline( ^^^^^^^^^^^^ File "/root/miniconda3/lib/python3.12/site-packages/tvm_ffi/cpp/extension.py", line 892, in load_inline build_inline( File "/root/miniconda3/lib/python3.12/site-packages/tvm_ffi/cpp/extension.py", line 740, in build_inline return _build_impl( ^^^^^^^^^^^^ File "/root/miniconda3/lib/python3.12/site-packages/tvm_ffi/cpp/extension.py", line 540, in _build_impl build_ninja(str(build_dir)) File "/root/miniconda3/lib/python3.12/site-packages/tvm_ffi/cpp/extension.py", line 412, in build_ninja raise RuntimeError("\n".join(msg)) RuntimeError: ninja exited with status 2 stdout: [1/2] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output cuda_0.o.d -Xcompiler -fPIC -std=c++17 -O2 -gencode=arch=compute_80,code=sm_80 -std=c++20 -O3 --expt-relaxed-constexpr -I/root/miniconda3/lib/python3.12/site-packages/tvm_ffi/include -I/root/miniconda3/lib/python3.12/site-packages/tvm_ffi/include -I/root/autodl-tmp/mini-sglang/python/minisgl/kernel/csrc/include -c /root/.cache/tvm-ffi/minisgl__index_2048_4_128_1_false_f634a34728e4e029/cuda.cu -o cuda_0.o FAILED: [code=2] cuda_0.o /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output cuda_0.o.d -Xcompiler -fPIC -std=c++17 -O2 -gencode=arch=compute_80,code=sm_80 -std=c++20 -O3 --expt-relaxed-constexpr -I/root/miniconda3/lib/python3.12/site-packages/tvm_ffi/include -I/root/miniconda3/lib/python3.12/site-packages/tvm_ffi/include -I/root/autodl-tmp/mini-sglang/python/minisgl/kernel/csrc/include -c /root/.cache/tvm-ffi/minisgl__index_2048_4_128_1_false_f634a34728e4e029/cuda.cu -o cuda_0.o nvcc warning : incompatible redefinition for option 'std', the last value of this option was used nvcc warning : incompatible redefinition for option 'optimize', the last value of this option was used /root/autodl-tmp/mini-sglang/python/minisgl/kernel/csrc/include/minisgl/utils.h(44): error: call to consteval function "std::source_location::current" did not produce a valid constant expression std::source_location::current()) { ^ /usr/include/c++/11/source_location(59): note #2703-D: cannot call non-constexpr function "__builtin_source_location" (declared implicitly) current(__builtin_ret_type __p = __builtin_source_location()) noexcept ^
/root/autodl-tmp/mini-sglang/python/minisgl/kernel/csrc/include/minisgl/utils.h(55): error: call to consteval function "std::source_location::current" did not produce a valid constant expression std::source_location location = std::source_location::current()) { ^ /usr/include/c++/11/source_location(59): note #2703-D: cannot call non-constexpr function "__builtin_source_location" (declared implicitly) current(__builtin_ret_type __p = __builtin_source_location()) noexcept ^
/root/autodl-tmp/mini-sglang/python/minisgl/kernel/csrc/include/minisgl/tensor.h(415): error: call to consteval function "std::source_location::current" did not produce a valid constant expression Loc_t loc = Loc_t::current()) const && -> const TensorMatcher && { ^ /usr/include/c++/11/source_location(59): note #2703-D: cannot call non-constexpr function "__builtin_source_location" (declared implicitly) current(__builtin_ret_type __p = __builtin_source_location()) noexcept ^
/root/autodl-tmp/mini-sglang/python/minisgl/kernel/csrc/include/minisgl/utils.cuh(65): error: call to consteval function "std::source_location::current" did not produce a valid constant expression std::source_location location = std::source_location::current()) ^ /usr/include/c++/11/source_location(59): note #2703-D: cannot call non-constexpr function "__builtin_source_location" (declared implicitly) current(__builtin_ret_type __p = __builtin_source_location()) noexcept ^
/root/autodl-tmp/mini-sglang/python/minisgl/kernel/csrc/include/minisgl/utils.cuh(74): error: call to consteval function "std::source_location::current" did not produce a valid constant expression CUDA_CHECK(std::source_location location = std::source_location::current()) ^ /usr/include/c++/11/source_location(59): note #2703-D: cannot call non-constexpr function "__builtin_source_location" (declared implicitly) current(__builtin_ret_type __p = __builtin_source_location()) noexcept ^
5 errors detected in the compilation of "/root/.cache/tvm-ffi/minisgl__index_2048_4_128_1_false_f634a34728e4e029/cuda.cu". ninja: build stopped: subcommand failed. `
gcc --version gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Mon_Apr__3_17:16:06_PDT_2023 Cuda compilation tools, release 12.1, V12.1.105 Build cuda_12.1.r12.1/compiler.32688072_0
`
nvidia-smi
Sat Dec 20 09:01:01 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.65.06 Driver Version: 580.65.06 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A800 80GB PCIe On | 00000000:4F:00.0 Off | Off |
| N/A 31C P0 44W / 300W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A800 80GB PCIe On | 00000000:D5:00.0 Off | Off |
| N/A 32C P0 41W / 300W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+
the command nvcc test.cu -o test && ./test can work normally. `
fix it by update nvcc version
nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2025 NVIDIA Corporation Built on Fri_Feb_21_20:23:50_PST_2025 Cuda compilation tools, release 12.8, V12.8.93 Build cuda_12.8.r12.8/compiler.35583870_0
Hi! I mainly test the code with CUDA 12.9, so I can't confidently claim a minimum supported CUDA toolkit version yet.
I'll try to improve compatibility where possible and also document the CUDA requirements more clearly once they're confirmed.