MangoHud icon indicating copy to clipboard operation
MangoHud copied to clipboard

Periodic stuttering with gpu_power and multiple GPUs

Open gnusenpai opened this issue 1 year ago • 6 comments

On multi-GPU NVIDIA systems, there is stuttering every time MangoHud polls the power usage.

Using pci_dev or gpu_list to exclude the non-render GPU does not fix the stuttering. It seems MangoHud is still polling it even when its stats aren't being shown.

Enabling either persistence mode in the NVIDIA driver doesn't work either.

This might not strictly be a MangoHud issue, but this doesn't occur on <v0.8.0.

Test System

  • Gentoo
  • Hyprland
  • MangoHud master (e906c6b)
  • NVIDIA 570.133.07
  • RTX 3080 + GTX 1080 Ti

To Reproduce

  1. Install MangoHud v0.8.0 or higher
  2. Run a test with MANGOHUD_CONFIG=gpu_power MANGOHUD=1 vkcube

Expected behavior Smooth frametimes are possible by one of:

  1. downgrading to v0.7.2
  2. disabling gpu_power in config
  3. removing 2nd GPU

Screenshots

gnusenpai avatar Mar 24 '25 06:03 gnusenpai

I found a hack that works for my setup specifically:

diff --git a/src/gpu.cpp b/src/gpu.cpp
index 170d1a9..75d1806 100644
--- a/src/gpu.cpp
+++ b/src/gpu.cpp
@@ -79,6 +79,11 @@ GPUS::GPUS(overlay_params* params) : params(params) {
             }
         }
 
+        if (vendor_id == 0x10de && device_id == 0x1b06) {
+            SPDLOG_INFO("found bad GPU, skipping...");
+            continue;
+        }
+
         std::shared_ptr<GPU> ptr = std::make_shared<GPU>(node_name, vendor_id, device_id, pci_dev);
 
         if (params->gpu_list.size() == 1 && params->gpu_list[0] == idx++)

It probably wouldn't too take much more to turn this into a proper fix.

gnusenpai avatar Mar 24 '25 12:03 gnusenpai

SPDLOG_INFO("found bad GPU, skipping...");

So the issue is not having two GPUs but accessing metrics on that specific GPU?

flightlessmango avatar Mar 24 '25 21:03 flightlessmango

So the issue is not having two GPUs but accessing metrics on that specific GPU?

Don't read too much into the language. Looking for that GPU specifically was just the obvious easy fix for testing. I'm pretty sure accessing metrics for the non-render GPU is what the actual problem is.

I can try to confirm this by swapping the GPUs around, maybe finding another one to plug in.

gnusenpai avatar Mar 25 '25 02:03 gnusenpai

Well I switched my 1080Ti to be the primary GPU and removed my hack. No stuttering, weird. Either Pascal is a bit broken or NVIDIA gives some sort of special treatment to the GPU with the efifb or something. However, my shell's bar uses NVML to access power information from the GPUs the same way MangoHud is doing and it works just fine. No idea what's going on here...

gnusenpai avatar Mar 27 '25 09:03 gnusenpai

I came across the same issue reccently, I have GTX 1080, nearly the same as your Ti version, and I turned off in Goverlay to not show mangohud power for mangohud for my GPU and now the FPS are stable. So if anyone come across the same issue, just turn off power in mangohud config for GPU. BTW I use CachyOS. :)

Draguljche avatar Apr 03 '25 13:04 Draguljche

However, my shell's bar uses NVML to access power information from the GPUs the same way MangoHud is doing and it works just fine.

Not sure what changed between then and now, but I've had to disable power monitoring for my other GPU in my shell cause it is causing system-wide stuttering.

gnusenpai avatar Apr 19 '25 16:04 gnusenpai