cppuprofile Adding multi-GPU support for Nvidia monitoring

Adding multi-GPU support for Nvidia monitoring

Open herrnils opened this issue 1 year ago • 1 comments

This MR adds the ability to monitor multiple Nvidia GPUs through nvidia-smi calls. It also adds some refactoring for the NvidiaMonitor. In detail, the changes include

Exposing the cmake variable GPU_MONITOR_NVIDIA as a C++ macro to automatically enable GPU monitoring in the example sample/main.cpp.
Simplifying the lib/igpumonitor.h abstract base class to the two basic functions getUsage and getMemory.
The NvidiaMonitor was changed the most by
1. removing the threading via watchGPU to read in the nvidia-smi calls and changing them to simpler popen(...) calls with no need for forked processes.
2. The containers to hold GPU usage, used, and total memory were changed to std::vectors to house monitored information for multiple GPUs. During construction of the NvidiaMonitor object, the number of available GPUs in the system is collected from nvidia-smi and the vectors are initialized.
3. The function void update_gpu_data(const std::vector<uprofile::NvidiaMonitor::Data>& data) was added to update the usage and/or memory vectors, depending on the passed data argument.
The watching checks for UProfileImpl::dumpGpuUsage() and UProfileImpl::dumpGpuMemory() functions were removed since they are no longer needed
The show-graph tool was updated to plot multiple GPU usage and memory.

Aug 14 '24 16:08 herrnils

cppuprofile cppuprofile copied to clipboard

Adding multi-GPU support for Nvidia monitoring

cppuprofile
cppuprofile copied to clipboard