libcudacxx icon indicating copy to clipboard operation
libcudacxx copied to clipboard

Support for `std::complex<__half>`?

Open leofang opened this issue 3 years ago • 3 comments

Hello, suggested by @allisonvacanti I am opening an issue for information gathering. I am wondering if libcu++ can support half complex natively or not.

We are evaluating the possibility of supporting complex32 (consisting of two half-precision floating numbers, i.e., __half) in CuPy, see cupy/cupy#3370. This is important for us to be able to seamlessly work with CUDA libraries that provide half complex support (such as cuFFT).

Currently CuPy uses a clone of Thrust headers for complex-number support (so we use thrust::complex<T> internally), but obviously Thrust does not support __half natively, and the only reason CuPy's internal could make thrust::complex<__half> compiled without any error is because we provide a conversion operator from __half to float so that the single-precision specializations are used, which is obviously not optimal nor desired in terms of performance.

Thanks.

leofang avatar Dec 18 '20 19:12 leofang

Hi Leo, I've been pondering this myself. One problem is that it's a device only type and being a heterogeneous library it would probably need to live outside cuda::std:: as an extension.

I'd like to see this implemented personally. There might be other work items that take precedence, but it might be doable in 2.0.0 if it doesn't diverge tremendously from std::complex.

wmaxey avatar Dec 18 '20 20:12 wmaxey

Thanks for quick reply, Wesley! So I took a quick look at the namespace convention of libcu++, do you mean the support of half complex will likely live in cuda::device:: or just cuda::?

Regardless of where it lands eventually, I think there's always some template and/or macro tricks for us to access the right namespace based on the real type, so I don't think it's a big deal as long as it's somewhere 🙂

Is there an expected timeline for 2.0.0? I see that complex numbers are listed in 1.4.0 so I suppose it refers to the single & double complexes.

One last query (I hope!): Will libcu++ work with NVRTC? In CuPy most of our kernels are compiled by NVRTC because, like in several NVIDIA RAPIDS libraries, the number of kernels we support is exponentially large, and there's no way to precompile them all. But I am a bit worried that NVRTC is not listed in the supported compilers, although I know it's a strictly C++ compiler so hopefully it'd just work.

leofang avatar Dec 19 '20 02:12 leofang

do you mean the support of half complex will likely live in cuda::device:: or just cuda::

Where it would end up is probably open for discussion, but I don't have any opinion at the moment.

Is there an expected timeline for 2.0.0?

As for a timeline, 2.0.0 is mid February, but there's a lot of contention for bandwidth. Things like std::atomic_ref are in progress as well as NVC++ support so there's no telling how much time commitment we'll get for this in 2.0.0.

I see that complex numbers are listed in 1.4.0 so I suppose it refers to the single & double complexes.

Correct, libcu++ has support for single and double complex today.

Will libcu++ work with NVRTC?

Mostly everything in libcu++ works fine with NVRTC, there are some exceptions, but they would most likely not pop up in general use. almost all of our heterogeneous testing today includes NVRTC passes.

wmaxey avatar Dec 19 '20 04:12 wmaxey

Keep it noted: @cliffburdick asked for bf16 support in #153.

leofang avatar May 19 '23 19:05 leofang

Closing in favor of https://github.com/NVIDIA/cccl/issues/525

jrhemstad avatar Nov 01 '23 17:11 jrhemstad