cccl Investigate building with libc++ and clang-cuda

We are currently not supporting libc++ but this is something we might want to test better in the future.

This explores building our libcu++ tests with clang-cuda

Nov 12 '25 08:11 miscco

@Artem-B for your interest, I am running into a brick wall with catch2, but I hope that that should be doable

Nov 12 '25 08:11 miscco

😬 CI Workflow Results

🟥 Finished in 1m 45s: Pass: 0%/2 | Total: 3m 28s | Max: 1m 44s

See results here.

Nov 12 '25 08:11 github-actions[bot]

I do not understand what is going wrong here, I am building locally in the devcontainer and it builds just fine

Nov 12 '25 09:11 miscco

For what it's worth I'm building/running all catch-2 based tests in cub in v3.1 without major issues, so it should be doable.

AFAICT, there should be no recent clang changes that would affect the clang itself, so clang builds should work for at least a few recent releases. There were some libc++ changes, though, so there may be some surprises there.

I'd be happy to help sorting it out. Probably the best way for me to reproduce the exact issues you're running into would be via a docker container.

Nov 12 '25 17:11 Artem-B

Its really strange, for whatever reasons in CI and only in CI it misses the include folder for libc++

At the same time we have issues with catch2 not being build with the right library, need to look how this is set up

Nov 12 '25 17:11 miscco

Is this the issue you're fighting:

 /usr/bin/sccache /usr/bin/clang++  -DCCCL_ENABLE_ASSERTIONS -DCCCL_ENABLE_OPTIONAL_REF -DCCCL_IGNORE_DEPRECATED_CPP_DIALECT -DCCCL_IGNORE_DEPRECATED_DISCARD_MEMORY_HEADER -DCCCL_IGNORE_DEPRECATED_STREAM_REF_HEADER -DLIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE -D_ALLOW_UNSUPPORTED_LIBCPP=1 -D_CCCL_HEADER_TEST -D_CCCL_NO_SYSTEM_HEADER -Dinternal_headertest_cuda___annotated_ptr_createpolicy_h_EXPORTS -I/home/coder/cccl/libcudacxx/include -O3 -DNDEBUG -std=c++20 --cuda-gpu-arch=sm_75 --cuda-gpu-arch=sm_80 --cuda-gpu-arch=sm_90 --cuda-gpu-arch=sm_100 --cuda-path=/usr/local/cuda -fPIC -stdlib=libc++ -Xclang=-fcuda-allow-variadic-functions -Wno-unknown-cuda-version -MD -MT libcudacxx/CMakeFiles/internal_headertest_cuda___annotated_ptr_createpolicy.h.dir/headers/cuda___annotated_ptr_createpolicy.h.cu.o -MF libcudacxx/CMakeFiles/internal_headertest_cuda___annotated_ptr_createpolicy.h.dir/headers/cuda___annotated_ptr_createpolicy.h.cu.o.d -x cuda -c /home/coder/cccl/build/cuda12.9-llvm20/libcudacxx-cpp20/libcudacxx/headers/cuda___annotated_ptr_createpolicy.h.cu -o libcudacxx/CMakeFiles/internal_headertest_cuda___annotated_ptr_createpolicy.h.dir/headers/cuda___annotated_ptr_createpolicy.h.cu.o
  In file included from <built-in>:1:
  In file included from /usr/lib/llvm-20/lib/clang/20/include/__clang_cuda_runtime_wrapper.h:41:
  /usr/lib/llvm-20/lib/clang/20/include/cuda_wrappers/cmath:30:2: error: "Could not find standard C++ header 'cmath'. Add -v to your compilation command to check the include paths being searched. You may need to install the appropriate standard C++ library package corresponding to the search path."
     30 | #error "Could not find standard C++ header 'cmath'. Add -v to your compilation command to check the include paths being searched. You may need to install the appropriate standard C++ library package corresponding to the search path."
        |  ^
  In file included from <built-in>:1:
  /usr/lib/llvm-20/lib/clang/20/include/__clang_cuda_runtime_wrapper.h:42:10: fatal error: 'cstdlib' file not found
     42 | #include <cstdlib>
        |          ^~~~~~~~~
  2 errors generated when compiling for sm_100.

There are a couple of things that look odd in that build. First is that the failures are in the arm64 builds. Do the same failures happen on intel hosts?

Another observation is that C++ compilations succeed, but CUDA compilations fail. That means that the libc++ headers are most likely available, but something in CUDA build goes wrong. One way to debug it is to run compilations with -v as the error message suggested and see where C++ and CUDA compilations end up looking for the headers. I expect CUDA compilation may be looking in the wrong place for some reason.

Nov 12 '25 17:11 Artem-B

The error is:

In file included from :1: In file included from /usr/lib/llvm-20/lib/clang/20/include/__clang_cuda_runtime_wrapper.h:41: /usr/lib/llvm-20/lib/clang/20/include/cuda_wrappers/cmath:30:2: error: "Could not find standard C++ header 'cmath'. Add -v to your compilation command to check the include paths being searched. You may need to install the appropriate standard C++ library package corresponding to the search path." 30 | #error "Could not find standard C++ header 'cmath'. Add -v to your compilation command to check the include paths being searched. You may need to install the appropriate standard C++ library package corresponding to the search path." | ^ In file included from :1: /usr/lib/llvm-20/lib/clang/20/include/__clang_cuda_runtime_wrapper.h:42:10: fatal error: 'cstdlib' file not found 42 | #include

Nov 12 '25 17:11 miscco

To be sure it is clang-cuda not NVCC, i tried that and there is no chance to do this without considerable changes to the runtime

Nov 12 '25 17:11 miscco

The main culprit being that libc++ heavily relies on new compiler builtins from clang that are not supported by NVCC so its just breaks with e.g __builtin_common_type

Nov 12 '25 17:11 miscco

Wrong thread? It does not look like nvcc or builtins are involved here.

Nov 12 '25 17:11 Artem-B

Wrong thread? It does not look like nvcc or builtins are involved here.

Oh that was related to NVCC with modern libc++ is a no-go because of the heavy use of clang-compiler builtins.

I am fine with clang-cuda and libc++ only

Nov 12 '25 17:11 miscco

I see that we use --cuda-path=/usr/local/cuda in the invocation could it be that it overrides the include folders?

Nov 12 '25 17:11 miscco

OK, back to the missing libc++ headers in CUDA compilation with clang. Would it be possible to tweak the patch and add temporarily add -v to the C++ and CUDA compilations? That may give us more clues about what's going on.

Nov 12 '25 17:11 Artem-B

--cuda-path is unlikey to affect libc++ inclusion. and the compilation itself does not seem to add any -I paths that I'd consider to be a problem. Stuff that I would worry about would be things like -isystem /usr/include or -I/path/to/custom/libc++/include. Those would likely break the required include order.

Nov 12 '25 17:11 Artem-B

😬 CI Workflow Results

🟥 Finished in 1m 47s: Pass: 0%/2 | Total: 3m 30s | Max: 1m 46s

See results here.

Nov 12 '25 17:11 github-actions[bot]

Looks like it is indeed not looking at the right directories https://github.com/NVIDIA/cccl/actions/runs/19306545198/job/55215540328?pr=6592#step:4:736

We are missing the /usr/lib/llvm-20/include/c++/v1/ include path

Nov 12 '25 17:11 miscco

If I build locally I get:

#include "..." search starts here: #include <...> search starts here: /home/coder/cccl/libcudacxx/include /usr/lib/llvm-20/lib/clang/20/include/cuda_wrappers /usr/lib/llvm-20/bin/../include/c++/v1 /usr/lib/llvm-20/lib/clang/20/include /usr/local/include /usr/include/x86_64-linux-gnu /usr/include /usr/local/cuda/include

Nov 12 '25 17:11 miscco

That is indeed suspicious. We do not have -v on C++ compilations in the CI build, so we do not see where they pick up C++ headers in that environment.

We are missing the /usr/lib/llvm-20/include/c++/v1/ include path

Yup. The question is why. The logs do not mention it at all, so it does not look like clang driver added them, but they got ignored because the path didn't exist. Somehow they just aren't added. I have not ever seen that before.

Where does clang come from in the CI environment? Perhaps there's something funky about the compiler itself?

Nov 12 '25 18:11 Artem-B

Its build here: https://github.com/rapidsai/devcontainers/actions/runs/19273329169/job/55123065290

Nov 12 '25 18:11 miscco

🥇 https://github.com/rapidsai/devcontainers/actions/runs/19273329169/job/55123065162#step:7:9268

Nov 12 '25 18:11 miscco

😬 CI Workflow Results

🟥 Finished in 5m 43s: Pass: 0%/4 | Total: 7m 10s | Max: 1m 53s

See results here.

Nov 12 '25 18:11 github-actions[bot]

I believe that removal is a red hering, somethign else is going on

Nov 12 '25 18:11 miscco

Oh no I found it, its our clang-format install 😢

https://github.com/rapidsai/devcontainers/actions/runs/19273329169/job/55123065162#step:7:8482

Nov 12 '25 18:11 miscco

Are you saying the CI build was not using the compiler it was supposed to be using?

If that's the case, I'm still puzzled how it ended up skipping libc++ includes. I've checked the clang code, and cuda compilation just calls the host toolchain to insert libc++ include path. It should work the same for C++ and for CUDA compilations with the same clang.

Nov 12 '25 18:11 Artem-B

😬 CI Workflow Results

🟥 Finished in 3m 26s: Pass: 0%/4 | Total: 6m 59s | Max: 1m 52s

See results here.

Nov 13 '25 13:11 github-actions[bot]

😬 CI Workflow Results

🟥 Finished in 56m 52s: Pass: 0%/4 | Total: 2h 19m | Max: 56m 48s

See results here.

Nov 13 '25 16:11 github-actions[bot]

I am super puzzled, I verified that the devxontainers now have libc++ inside

Nov 13 '25 16:11 miscco

I am super puzzled, I verified that the devxontainers now have libc++ inside

Given that C++ code compiles fine, they do. It's that clang for some reason does not add include path to them during CUDA compilation. It's very puzzling, indeed.

My next step would be to replicate that ARM environment locally, and try reproducing the issue manually. I.e. start by verifying whether minimal invocation of compiler with "-x cuda" is capable of compiling an empty file. We're failing during parsing pre-included headers, so we do not even need any user code.

Unfortunately I do not have immediate access to arm64 host, si I can't try it myself.

Nov 13 '25 17:11 Artem-B

it fails for both arm and amd64, so its not dependent on that

Nov 13 '25 18:11 miscco

OK, the latest run appears to have compiler picking up libc++ include path correctly, and succeeds compiling CUDA code:

#include <...> search starts here:
   /home/coder/cccl/libcudacxx/include
   /usr/lib/llvm-20/lib/clang/20/include/cuda_wrappers
   /usr/include/c++/v1
   /usr/lib/llvm-20/lib/clang/20/include
   /usr/local/include
   /usr/include/x86_64-linux-gnu
   /usr/include
   /usr/local/cuda/include
...
  "/usr/local/cuda/bin/ptxas" -m64 -O3 -v --gpu-name sm_90 --output-file /tmp/cuda___annotated_ptr_associate_access_property-sm_90-48b5fa.o /tmp/cuda___annotated_ptr_associate_access_property-sm_90-e7ebb7.s
  ptxas info    : 0 bytes gmem
   "/usr/local/cuda/bin/fatbinary" -64 --create /tmp/cuda___annotated_ptr_associate_access_property-a9e153.fatbin --image=profile=sm_100,file=/tmp/cuda___annotated_ptr_associate_access_property-sm_100-0db165.o --image=profile=sm_75,file=/tmp/cuda___annotated_ptr_associate_access_property-sm_75-9069dc.o --image=profile=sm_80,file=/tmp/cuda___annotated_ptr_associate_access_property-sm_80-0f9c71.o --image=profile=sm_90,file=/tmp/cuda___annotated_ptr_associate_access_property-sm_90-48b5fa.o

So, this part of the problem appears to be solved.

The build still fails, but that's a different issue. Now the build fails during linking phase. I suspect the root cause is that the build mixes compilations with libc++ and libstdc++. I think C++ parts of catch2 are compiled with the default C++ library choice and that is libstdc++. The CUDA parts are built with libc++. When we attempt to link them all together, there's an obvious mismatch.

Nov 13 '25 19:11 Artem-B