cccl [FEA]: Disable NVTX ranges by default for `thrust::seq` Thrust algorithms

Is this a duplicate?

[x] I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct

Area

Thrust

Is your feature request related to a problem? Please describe.

As a user of CCCL's thrust:: algorithms and containers with a non-CUDA backend, it is unlikely I am interested in capturing NVTX ranges for the execution of that algorithm. Furthermore, capturing an NVTX range is much more likely to introduce overhead as the cost starting/stopping the range is small relative to a kernel launch, but can be large compared to CPU-only work.

Describe the solution you'd like

I'd like NVTX ranges to not be captured by default when a thrust:: algorithm is invoked with an execution policy other than thrust::cuda::par.

Describe alternatives you've considered

No response

Additional context

We could consider adding a compile-time flag to enable capturing Thrust host ranges by default, but I think YAGNI applies here.

Oct 20 '25 18:10 jrhemstad

I assume we want to have NVTX ranges for any execution policy derived from thrust::cuda_cub::execution_policy. That includes cuda::tag{}, cuda::par_nosync, cuda::par.on(...) (which is a different type than decltype(cuda::par)).

Oct 20 '25 19:10 bernhardmgruber

@bernhardmgruber

I assume we want have NVTX ranges

you mean "we want to have", right? and not "we won't have".

Oct 31 '25 00:10 gonidelis

you mean "we want to have", right? and not "we won't have"

Yes. I want to remain with the status quo and emit NVTX ranges for any execution policy that targets the CUDA backend.

Nov 01 '25 21:11 bernhardmgruber

@bernhardmgruber perf the fixing PR is doing just that by disabling only thrust::seq and its derivatives

Nov 05 '25 18:11 gonidelis