timemory
timemory copied to clipboard
Upgrade cupti_counters component
- The current implementation for the CUDA HW counters using the callback API forces a synchronize at the end of a marker
- This doesn't necessarily have too much of an impact on performance since the callback API serializes the kernels anyway but it should be migrated to use the correlation ID and updated later since this scheme will be required with the new CUPTI profiling API