thrust
thrust copied to clipboard
`__noinline__` Macro Definition causes `clang++` Compile Error
I'm working with clang++ 13.0 and CUDA Toolkit 11.6. It seems to me that there's probably some problem with the __noinline__ macro. In thrust, it is used as __attribute__((__noinline__)) which expects __noinline__ expand to noinline. However, with clang++, __noinline__ expands to __attribute__((noinline)), which makes __attribute__((__attribute__((noinline)))) and cause a compile error. This happens when I compile with following arguments with main.cu including <thust/system/cuda/pointer.h>.
clang++ --cuda-gpu-arch=sm_86 -std=c++17 -o main main.cu
Here's part of the compile output:
In file included from /opt/cuda/include/thrust/system/cuda/pointer.h:27:
In file included from /opt/cuda/include/thrust/detail/reference.h:25:
In file included from /opt/cuda/include/thrust/system/detail/adl/assign_value.h:42:
In file included from /opt/cuda/include/thrust/system/cuda/detail/assign_value.h:25:
In file included from /opt/cuda/include/thrust/system/cuda/detail/copy.h:100:
In file included from /opt/cuda/include/thrust/system/cuda/detail/internal/copy_cross_system.h:42:
In file included from /opt/cuda/include/thrust/detail/temporary_array.h:40:
In file included from /opt/cuda/include/thrust/detail/contiguous_storage.h:21:
In file included from /opt/cuda/include/thrust/detail/allocator/allocator_traits.h:29:
In file included from /opt/cuda/include/thrust/detail/memory_wrapper.h:29:
In file included from /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/12.1.0/../../../../include/c++/12.1.0/memory:77:
In file included from /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/12.1.0/../../../../include/c++/12.1.0/bits/shared_ptr.h:53:
/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/12.1.0/../../../../include/c++/12.1.0/bits/shared_ptr_base.h:196:22: error: use of undeclared identifier 'noinline'; did you mean 'inline'?
__attribute__((__noinline__))
^
/opt/cuda/include/crt/host_defines.h:83:24: note: expanded from macro '__noinline__'
__attribute__((noinline))
i had the same issues. after changing the macro in /opt/cuda/include/crt/host_defines.h:83
from
#define __noinline__ __attribute__((noinline))
to
#define __noinline__ noinline
the code's compiling
After doing some research, I found that compiling with -stdlib=libc++ works just fine. It seems that clang by default links STL code with libstdc++, which is a part of g++, where the macro __noinline__ is different from thrust's definition.
__noinline__ is not a macro in libstdc++, it's an attribute name i.e. effectively a keyword. It's used as __attribute__((__noinline__)) which has been valid in GCC for many many years. GCC has always allowed __name__ for an attribute token, as the safe way for the implementation to refer to an attribute without clashing with a macro name in the program.
I am not sure whether this is really a thrust issue. We are not defining __noinline__ in thrust and the code in question boils down to standard includes.
The issue stems from the fact that CUDA headers define __noinline__ (which should be a reserved compiler keyword) as a macro for the noinline attribute. This conflicts with recent changes in the GCC 12 standard headers, where __noinline__ is used as the attribute name (the standard headers cannot us __attribute__((noinline)), because noinline is not a reserved keyword, so they use __attribute__((__noinline__)) instead).
A possible workaround is to undefine the __noinline__ macro before including the system headers, and redefine it afterwards.
Is there any progress regarding this issue? I'd like to use clang++ with libstdc++, but it seems impossible without modification of headers.
It looks like this was caused by a recent change in libstdc++ here: https://github.com/gcc-mirror/gcc/commit/dbf8bd3c2f2cd2d27ca4f0fe379bd9490273c6d7#diff-b358f609a31a4af8af72cc3197566abaa157bb7f8681b45580f1e5477540457cR192-R193
However, this issue is unique to clang as nvcc compiles the equivalent just fine:
__attribute__ (( __noinline__ )) void foo();
https://godbolt.org/z/Y6h99GcWb
This issue is unique to clang and Thrust does not officially support clang as a CUDA device compiler.
We are happy to review and accept any PRs from the community that fix this problem without breaking any of our supported compiler platforms.
However, I don't believe there is anything we can do in Thrust to address this issue because as https://godbolt.org/z/Y6h99GcWb shows, this problem exists in clang without including any Thrust headers.
The issue is not related to thrust project, closing it.
It looks like this was caused by a recent change in libstdc++ here: [gcc-mirror/gcc@dbf8bd3#diff-b358f609a31a4af8af72cc3197566abaa157bb7f8681b45580f1e5477540457cR192-R193](https://github.com/gcc-mirror/gcc/commit/dbf8bd3c2f2cd2d27ca4f0fe379bd9490273c6d7#diff- However, this issue is unique to
clangasnvcccompiles the equivalent just fine:__attribute__ (( __noinline__ )) void foo();
It compiles, but it's not fine:
$echo '__attribute__((__noinline__)) void foo();' | nvcc -x cu -dD -E - | tail -1
__attribute__((__attribute__((noinline)))) void foo();
AFAICT, nvcc just ignores an unknown attribute with the name __attribute__((noinline)) expanded from __noinline__.
Indeed, I believe the nvcc frontend has special handling for that attribute expansion. clang would need to emulate that "special" handling :slightly_smiling_face:
Right. The __attribute__((__attribute__((noinline)))) void foo(); gets magically transformed into __attribute((noinline)) void foo() by the time it makes it to the final host compilation. 😭
And the magic seems to work only for __attribute__((__attribute__((noinline)))). Any other variants I tried error out.
So, it's been a known issue in the CUDA headers, deliberately worked around in NVCC. And now the bug lives on and keeps giving...
So, it's been a known issue in the CUDA headers, deliberately worked around in NVCC.
NVIDIA engineers “fixed” the compiler instead of fixing the header files? That's brilliant!