[FEA] Ability to disable RMM NVTX without disabling all NVTX
Is your feature request related to a problem? Please describe. After #336 I noticed our nsys profiles are significantly larger due to RMM NVTX ranges being generated. We have a fairly complex allocator stack which ends up generating many nested allocate ranges on every memory allocation, and memory allocation/free events are very common in the timeline. In practice we don't need to see the RMM details, so we'd like to disable these ranges. There's a USE_NVTX flag for controlling whether RMM uses NVTX ranges, but unfortunately this is the same flag libcudf uses. We find the libcudf ranges to be very useful and would like to preserve those, but it does not seem possible to enable libcudf NVTX but disable RMM NVTX.
Describe the solution you'd like Rename RMM's USE_NVTX cmake flag to RMM_USE_NVTX, and have it default to USE_NVTX if that variable is defined or ON otherwise. This should be backwards compatible, as USE_NVTX=ON or USE_NVTX=OFF will still control whether RMM enables or disables NVTX ranges. However this also allows projects to control NVTX in RMM only by setting RMM_USE_NVTX.
Describe alternatives you've considered Have a separate cmake flag that when set disables NVTX in RMM even if USE_NVTX is on. This would be less flexible, as it wouldn't allow a configuration where RMM has NVTX but the parent project, also using USE_NVTX, does not.
I think we probably want both a way to disable RMM NVTX completely, and possibly some control over granularity. When debugging allocation bottlenecks, it's helpful to have finer granularity of NVTX on your timeline. But when not, you don't want much, especially not NVTX regions for very rapid and frequent calls.