cccl
cccl copied to clipboard
[FEA]: Investigate if NVTX ranges in CUB algorithms support graph capture
Is this a duplicate?
- [X] I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct
Area
CUB
Is your feature request related to a problem? Please describe.
As of https://github.com/NVIDIA/cccl/issues/719 we have NVTX ranges in CUB device algorithms. Most CUB device algorithms support graph capture. For now, it's not clear if NVTX is working correctly in presence of graph capture.
Describe the solution you'd like
We need to understand if NVTX ranges work correctly when CUB is in graph capture mode. Since all of our *_.lid_2 tests run CUB algorithms in graph capture mode, one of these tests, say cub.cpp17.test.device_select_if.lid_2, can be used as an example. If NVTX ranges do not contain kernels they surround, I'd prefer no NVTX ranges to be reported.
Describe alternatives you've considered
No response
Additional context
No response