cccl icon indicating copy to clipboard operation
cccl copied to clipboard

[FEA]: Thrust-NVRTC Support

Open lamarrr opened this issue 8 months ago • 5 comments

Is this a duplicate?

  • [x] I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct

Area

Thrust

Is your feature request related to a problem? Please describe.

CUDF is adopting JIT for kernel compilation via JITIFY/NVRTC.

NVRTC unlike NVCC requires that the source files only contain device code. No host code or headers, even if they aren't used. Support for it is not planned either. This has prevented us from adopting JITIFY for our device kernels, as our dependencies don't support this use case either.

We need Thrust to support inclusion from offline-compiled device-only JITIFY code.

Describe the solution you'd like

  • Disable code sections that are not supported in JITIFY code when in JITIFY mode.
  • Use macros to only enable sequential algorithm specializations (i.e. thrust::seq) when in JITIFY mode
  • Prevent inclusion of host headers in JITIFY mode (i.e., pthread.h). Ideally, it should only include cuda/std/* headers
  • Add a CI job specifically for JITIFY compilation tests

Describe alternatives you've considered

Patching thrust to support JITIFY code - Not feasible or practical

Additional context

Here are some of the thrust headers we use in CUDF's device code:

  • thrust/advance.h
  • thrust/distance.h
  • thrust/equal.h
  • thrust/fill.h
  • thrust/find.h
  • thrust/generate.h
  • thrust/iterator/constant_iterator.h
  • thrust/iterator/counting_iterator.h
  • thrust/iterator/discard_iterator.h
  • thrust/iterator/reverse_iterator.h
  • thrust/iterator/transform_input_output_iterator.h
  • thrust/iterator/transform_iterator.h
  • thrust/iterator/transform_output_iterator.h
  • thrust/iterator/zip_iterator.h
  • thrust/limits.h
  • thrust/logical.h
  • thrust/memory.h
  • thrust/mismatch.h
  • thrust/optional.h
  • thrust/pair.h
  • thrust/transform_reduce.h
  • thrust/tuple.h

lamarrr avatar Apr 01 '25 18:04 lamarrr

Thanks @lamarrr.

I think to down scope this a bit, it would helpful if you could go through and do a pass first of looking at what headers and symbols in libcudf can just be replaced with cuda/std equivalents. Everything in cuda/std is already tested with NVRTC today and it will be less work for libcudf to update its usage of thrust:: symbols to cuda::std:: symbols:

For example, without actually looking, I believe all of the following have direct replacements in cuda/std:

  • thrust/tuple.h
  • thrust/pair.h
  • thrust/limits.h
  • thrust/optional.h
  • thrust/advance.h
  • thrust/logical.h
  • thrust/distance.h
  • thrust/mismatch.h
  • thrust/equal.h
  • thrust/fill.h
  • thrust/find.h
  • thrust/generate.h

Naively, I believe all that would really remain would be making Thrust's fancy iterators (or replacements in cuda::) work with NVRTC.

jrhemstad avatar Apr 01 '25 19:04 jrhemstad

AFAIK, all Thrust iterators already work under NVRTC since at least CCCL 3.0, see #3676. Furthermore, if you only need Thrust's sequential algorithm implementations, you will be able to get by with just <cuda/std/algorithm>, which is almost finished here: #3741. limits.h, pair.h, tuple.h, optional.h have direct equivalent's in <cuda/std/...> as well.

Once #3741 lands, let's narrow down your list to what's still missing.

bernhardmgruber avatar Apr 01 '25 19:04 bernhardmgruber

do a pass first of looking at what headers and symbols in libcudf can just be replaced with cuda/std equivalents.

Agreed. Most of the issues should be solved if we use the cuda::std:: equivalents. In our offline discussions, @lamarrr noted that the zip iterator is a more complex fancy iterator to handle compared to others. A potential workaround is to use a counting iterator together with a transform iterator instead: https://godbolt.org/z/5fdMqecoP

PointKernel avatar Apr 01 '25 19:04 PointKernel

I think to down scope this a bit, it would helpful if you could go through and do a pass first of looking at what headers and symbols in libcudf can just be replaced with cuda/std equivalents. Everything in cuda/std is already tested with NVRTC today and it will be less work for libcudf to update its usage of thrust:: symbols to cuda::std:: symbols:

Yeah, we are already using cuda/std/ headers with NVRTC. Awesome! We'll update them to use cuda::std:: where we can. Some of the work is already underway: https://github.com/rapidsai/cudf/pull/18427, I'll replace the other facilities; I don't expect it to be straightforward, as we have had some subtle bugs in past attempts, cc: @PointKernel. I also had the impression it would be beneficial to other users.

Naively, I believe all that would really remain would be making Thrust's fancy iterators (or replacements in cuda::) work with NVRTC.

Yes

lamarrr avatar Apr 04 '25 18:04 lamarrr

AFAIK, all Thrust iterators already work under NVRTC since at least CCCL 3.0

Awesome! Thanks! My tests were with the 2.8 version.

... which is almost finished here: https://github.com/NVIDIA/cccl/pull/3741...

That's really great news. Thanks!

lamarrr avatar Apr 04 '25 18:04 lamarrr