cppuprofile icon indicating copy to clipboard operation
cppuprofile copied to clipboard

Adding multi-GPU support for Nvidia monitoring

Open herrnils opened this issue 1 year ago • 1 comments

This MR adds the ability to monitor multiple Nvidia GPUs through nvidia-smi calls. It also adds some refactoring for the NvidiaMonitor. In detail, the changes include

  • Exposing the cmake variable GPU_MONITOR_NVIDIA as a C++ macro to automatically enable GPU monitoring in the example sample/main.cpp.
  • Simplifying the lib/igpumonitor.h abstract base class to the two basic functions getUsage and getMemory.
  • The NvidiaMonitor was changed the most by
    1. removing the threading via watchGPU to read in the nvidia-smi calls and changing them to simpler popen(...) calls with no need for forked processes.
    2. The containers to hold GPU usage, used, and total memory were changed to std::vectors to house monitored information for multiple GPUs. During construction of the NvidiaMonitor object, the number of available GPUs in the system is collected from nvidia-smi and the vectors are initialized.
    3. The function void update_gpu_data(const std::vector<uprofile::NvidiaMonitor::Data>& data) was added to update the usage and/or memory vectors, depending on the passed data argument.
  • The watching checks for UProfileImpl::dumpGpuUsage() and UProfileImpl::dumpGpuMemory() functions were removed since they are no longer needed
  • The show-graph tool was updated to plot multiple GPU usage and memory.

herrnils avatar Aug 14 '24 16:08 herrnils