cccl
cccl copied to clipboard
[FEA]: Thrust-NVRTC Support
Is this a duplicate?
- [x] I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct
Area
Thrust
Is your feature request related to a problem? Please describe.
CUDF is adopting JIT for kernel compilation via JITIFY/NVRTC.
NVRTC unlike NVCC requires that the source files only contain device code. No host code or headers, even if they aren't used. Support for it is not planned either. This has prevented us from adopting JITIFY for our device kernels, as our dependencies don't support this use case either.
We need Thrust to support inclusion from offline-compiled device-only JITIFY code.
Describe the solution you'd like
- Disable code sections that are not supported in JITIFY code when in JITIFY mode.
- Use macros to only enable sequential algorithm specializations (i.e.
thrust::seq) when in JITIFY mode - Prevent inclusion of host headers in JITIFY mode (i.e.,
pthread.h). Ideally, it should only includecuda/std/*headers - Add a CI job specifically for JITIFY compilation tests
Describe alternatives you've considered
Patching thrust to support JITIFY code - Not feasible or practical
Additional context
Here are some of the thrust headers we use in CUDF's device code:
- thrust/advance.h
- thrust/distance.h
- thrust/equal.h
- thrust/fill.h
- thrust/find.h
- thrust/generate.h
- thrust/iterator/constant_iterator.h
- thrust/iterator/counting_iterator.h
- thrust/iterator/discard_iterator.h
- thrust/iterator/reverse_iterator.h
- thrust/iterator/transform_input_output_iterator.h
- thrust/iterator/transform_iterator.h
- thrust/iterator/transform_output_iterator.h
- thrust/iterator/zip_iterator.h
- thrust/limits.h
- thrust/logical.h
- thrust/memory.h
- thrust/mismatch.h
- thrust/optional.h
- thrust/pair.h
- thrust/transform_reduce.h
- thrust/tuple.h
Thanks @lamarrr.
I think to down scope this a bit, it would helpful if you could go through and do a pass first of looking at what headers and symbols in libcudf can just be replaced with cuda/std equivalents. Everything in cuda/std is already tested with NVRTC today and it will be less work for libcudf to update its usage of thrust:: symbols to cuda::std:: symbols:
For example, without actually looking, I believe all of the following have direct replacements in cuda/std:
- thrust/tuple.h
- thrust/pair.h
- thrust/limits.h
- thrust/optional.h
- thrust/advance.h
- thrust/logical.h
- thrust/distance.h
- thrust/mismatch.h
- thrust/equal.h
- thrust/fill.h
- thrust/find.h
- thrust/generate.h
Naively, I believe all that would really remain would be making Thrust's fancy iterators (or replacements in cuda::) work with NVRTC.
AFAIK, all Thrust iterators already work under NVRTC since at least CCCL 3.0, see #3676. Furthermore, if you only need Thrust's sequential algorithm implementations, you will be able to get by with just <cuda/std/algorithm>, which is almost finished here: #3741. limits.h, pair.h, tuple.h, optional.h have direct equivalent's in <cuda/std/...> as well.
Once #3741 lands, let's narrow down your list to what's still missing.
do a pass first of looking at what headers and symbols in libcudf can just be replaced with cuda/std equivalents.
Agreed. Most of the issues should be solved if we use the cuda::std:: equivalents. In our offline discussions, @lamarrr noted that the zip iterator is a more complex fancy iterator to handle compared to others. A potential workaround is to use a counting iterator together with a transform iterator instead: https://godbolt.org/z/5fdMqecoP
I think to down scope this a bit, it would helpful if you could go through and do a pass first of looking at what headers and symbols in libcudf can just be replaced with
cuda/stdequivalents. Everything incuda/stdis already tested with NVRTC today and it will be less work for libcudf to update its usage ofthrust::symbols tocuda::std::symbols:
Yeah, we are already using cuda/std/ headers with NVRTC.
Awesome! We'll update them to use cuda::std:: where we can.
Some of the work is already underway: https://github.com/rapidsai/cudf/pull/18427, I'll replace the other facilities; I don't expect it to be straightforward, as we have had some subtle bugs in past attempts, cc: @PointKernel. I also had the impression it would be beneficial to other users.
Naively, I believe all that would really remain would be making Thrust's fancy iterators (or replacements in
cuda::) work with NVRTC.
Yes
AFAIK, all Thrust iterators already work under NVRTC since at least CCCL 3.0
Awesome! Thanks! My tests were with the 2.8 version.
... which is almost finished here: https://github.com/NVIDIA/cccl/pull/3741...
That's really great news. Thanks!