libcudacxx icon indicating copy to clipboard operation
libcudacxx copied to clipboard

`constexpr` globals are unable to ODR-used in device code

Open wmaxey opened this issue 5 years ago • 5 comments

constexpr globals are unable to be ODR used in device code. Until device support for them is available we might need to use some workarounds or tricks.

One of which is replacing constexpr with __constant__. We should replace hacks like this with a define instead:

https://github.com/NVIDIA/libcudacxx/blob/main/libcxx/include/tuple#L1159

#ifdef __CUDA_ARCH__
  _LIBCUDACXX_INLINE_VAR __constant__ __ignore_t<unsigned char> ignore = __ignore_t<unsigned char>();
#else
  _LIBCUDACXX_INLINE_VAR constexpr __ignore_t<unsigned char> ignore = __ignore_t<unsigned char>();
#endif

Perhaps turning the above into:

_LIBCUDACXX_INLINE_VAR _LIBCUDACXX_CEXPR_GLOBAL __ignore_t<unsigned char> ignore = __ignore_t<unsigned char>();

wmaxey avatar Oct 03 '20 00:10 wmaxey

I don't think it's well defined to use __CUDA_ARCH__ at namespace scope. That would be considered host code and the docs are pretty clear that you shouldn't do this.

https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#virtual-architecture-identification-macro

The host code (the non-GPU code) must not depend on it.

jrhemstad avatar Oct 03 '20 01:10 jrhemstad

Hmm. That may be correct. Foiled by nvcc again, it seems...

griwes avatar Oct 03 '20 01:10 griwes

UB aside, it looks like the example doesn't work anyways: https://godbolt.org/z/GEns8K

Even if it did, you wouldn't be able to use the value as a constant expression in device code.

jrhemstad avatar Oct 03 '20 01:10 jrhemstad

I agree with the above, but this is just for satisfying an overload for now. We should figure out what the correct thing to do would be for this.

I think, until there is support, that we should look at removing these constructs and associated features.

wmaxey avatar Oct 03 '20 01:10 wmaxey

In our original discussions with the NVCC team, we did discuss either errataing or removing this.

I think we should:

  • Leave these features in.
  • Add errata explaining the limitations.

@wmaxey do you want to take point on that?

brycelelbach avatar Oct 29 '20 17:10 brycelelbach

To be sure, this does work with trivial types.

For ranges CPOs I moved to this hack

#if defined(__CUDA_ARCH__)
#  define _LIBCUDACXX_CPO_ACCESSIBILITY static __device__
#else
#  define _LIBCUDACXX_CPO_ACCESSIBILITY
#endif

However, that only works badly because the CPOs are typically stateless. The real fun starts when we need statefull clobals such as C++20 std::strong_ordering::less

miscco avatar Feb 23 '23 11:02 miscco

Converting to discussion. This is less an issue and more an ongoing thing to think about.

wmaxey avatar Feb 23 '23 17:02 wmaxey