[FEA]: Disable NVTX ranges by default for `thrust::seq` Thrust algorithms
Is this a duplicate?
- [x] I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct
Area
Thrust
Is your feature request related to a problem? Please describe.
As a user of CCCL's thrust:: algorithms and containers with a non-CUDA backend, it is unlikely I am interested in capturing NVTX ranges for the execution of that algorithm. Furthermore, capturing an NVTX range is much more likely to introduce overhead as the cost starting/stopping the range is small relative to a kernel launch, but can be large compared to CPU-only work.
Describe the solution you'd like
I'd like NVTX ranges to not be captured by default when a thrust:: algorithm is invoked with an execution policy other than thrust::cuda::par.
Describe alternatives you've considered
No response
Additional context
We could consider adding a compile-time flag to enable capturing Thrust host ranges by default, but I think YAGNI applies here.
I assume we want to have NVTX ranges for any execution policy derived from thrust::cuda_cub::execution_policy. That includes cuda::tag{}, cuda::par_nosync, cuda::par.on(...) (which is a different type than decltype(cuda::par)).
@bernhardmgruber
I assume we want have NVTX ranges
you mean "we want to have", right? and not "we won't have".
you mean "we want to have", right? and not "we won't have"
Yes. I want to remain with the status quo and emit NVTX ranges for any execution policy that targets the CUDA backend.
@bernhardmgruber perf the fixing PR is doing just that by disabling only thrust::seq and its derivatives