MangoHud icon indicating copy to clipboard operation
MangoHud copied to clipboard

GPU power draw shows wrong numbers with 32bit applications

Open xpander69 opened this issue 9 months ago • 16 comments

Describe the bug 32bit games/applications show powerdraw in 100k numbers or more.

List relevant hardware/software information

  • Arch Linux
  • MangoHud version: v0.8.1-4-gcd7c6cb
  • GPU: nvidia RTX 3080, 570.124.04 drivers

To Reproduce Steps to reproduce the behavior:

  1. open mangohud with any 32bit application like glxgears32 for example for quick test

Screenshots 32bit glxgears:

Image

64bit glxgears:

Image

edit: it actually shows correct powerdraw for 1-2 sec and then goes bust

xpander69 avatar Mar 10 '25 09:03 xpander69

This is Alice: Madness Returns, a 2011 year-old game, played on Steam with Proton.

Image

I'm using Arch Linux, with Wayland + Sway, and with the following versions of Mangohud downloaded from the official Arch repositories: extra/mangohud 0.8.1-1 [installed] multilib/lib32-mangohud 0.8.1-1 [installed]

The followings are my PC's specs: Nvidia RTX 3050 6 gb Intel i7-7700 16 gb DDR4 2400 MHz

Games like Minecraft (native), Hollow Knight (native), Hades (Proton), Resident Evil 3 Remake (Proton), Severed Steel (GOG-Lutris-Proton) played all fine, maybe it could be an issue related with the 32-bit build.

Have I missed some steps to fix the random wattage values from the GPU ?

Linerd4 avatar Mar 21 '25 09:03 Linerd4

Same issue on Fedora Wayland Nvidia RTX 3080 12GB (Driver: 570.133.07) Mangohud version: 0.8.1-2

JStrategic avatar Mar 30 '25 22:03 JStrategic

it might be 3000 series GPUs only issue. I seen people on 4000 series don't have this issue. So it could be some strange nvidia bug, but then again nvidia-smi reports all correctly

xpander69 avatar Mar 31 '25 05:03 xpander69

it might be 3000 series GPUs only issue.

I have a 4070 and it is happening for me as well

ShimSekai avatar Mar 31 '25 15:03 ShimSekai

diff --git a/src/nvidia.cpp b/src/nvidia.cpp
index b983656..e620387 100644
--- a/src/nvidia.cpp
+++ b/src/nvidia.cpp
@@ -118,9 +118,16 @@ void NVIDIA::get_instant_metrics_nvml(struct gpu_metrics *metrics) {

         if (params->enabled[OVERLAY_PARAM_ENABLED_gpu_power] || (logger && logger->is_active())) {
             unsigned int power, limit;
-            nvml.nvmlDeviceGetPowerUsage(device, &power);
+            nvmlReturn_t ret_power = nvml.nvmlDeviceGetPowerUsage(device, &power);
+            if (ret_power != NVML_SUCCESS) {
+                spdlog::debug("nvmlDeviceGetPowerUsage failed: {}", nvml.nvmlErrorString(ret_power));
+                power = 0;
+            } else {
+                spdlog::debug("Raw power usage (mW): {}", power);
+                metrics->powerUsage = power / 1000;
+            }
+
             nvml.nvmlDeviceGetPowerManagementLimit(device, &limit);
-            metrics->powerUsage = power / 1000;
             metrics->powerLimit = limit / 1000;
         }

Can someone get logs with this patch please?

flightlessmango avatar Apr 10 '25 19:04 flightlessmango

I'm not sure what i'm doing wrong or how to get logs from this. I built lib32-mangohud with the patch, triple checked its patched, but i have no change and nothing pops out into terminal either. should i enable some sort of secret debug option?

edit: all i get is [2025-04-11 13:02:39.875] [MANGOHUD] [info] [gpu.cpp:98] Set renderD128 as active GPU (id=10de:2216 pci_dev=0000:0a:00.0)

i'm maybe stupid ofc and missing something super obvious

xpander69 avatar Apr 11 '25 09:04 xpander69

Oh yeah sorry you need to use MANGOHUD_LOG_LEVEL=debug

flightlessmango avatar Apr 11 '25 21:04 flightlessmango

Ahaa.. is it because 570.133 drivers don't seem to have compatible 32bit libxnvctrl [2025-04-12 08:31:59.404] [MANGOHUD] [debug] [loader_nvctrl.cpp:39] Failed to open 32bit libXNVCtrl.so.0: libXNVCtrl.so.0: cannot open shared object file: No such file or directory

full log with glxgears32: https://pastebin.com/sHJVZbEQ

xpander69 avatar Apr 12 '25 05:04 xpander69

This looks like driver regression to me. API clearly states that values should be reported in mW, and these numbers dont even resemble any power value, just randomness

Try older version drivers, preferably on a spare os, as to not break your current one.

17314642 avatar Apr 12 '25 07:04 17314642

Mangohud can't be at fault here, because it doesnt manipulate power values (except dividng by 1000 for unit conversion), just takes it straight from nvml and immediately stores it

17314642 avatar Apr 12 '25 07:04 17314642

Sadly 565 drivers wont compile for kernel 6.14. Too much trouble to patch it or downgrade kernel for it to try, but yeah i think its a regression in the driver as it used to work fine before. my guess is that the regression happened with 570 drivers

xpander69 avatar Apr 12 '25 07:04 xpander69

Ahaa.. is it because 570.133 drivers don't seem to have compatible 32bit libxnvctrl [2025-04-12 08:31:59.404] [MANGOHUD] [debug] [loader_nvctrl.cpp:39] Failed to open 32bit libXNVCtrl.so.0: libXNVCtrl.so.0: cannot open shared object file: No such file or directory

full log with glxgears32: https://pastebin.com/sHJVZbEQ

For the good measure try 0.7.2 mangohud on 32-bit

17314642 avatar Apr 12 '25 07:04 17314642

it might be 3000 series GPUs only issue. I seen people on 4000 series don't have this issue. So it could be some strange nvidia bug, but then again nvidia-smi reports all correctly

nvidia-smi is a 64-bit application, that's probably why

17314642 avatar Apr 12 '25 07:04 17314642

For the good measure try 0.7.2 mangohud on 32-bit

same issue with 0.7.2 so yeah probably nvidia regression or something

xpander69 avatar Apr 12 '25 07:04 xpander69

Ok with 575.51.02 drivers its still broken, but now it reports just 8mW

[2025-04-16 22:15:23.005] [MANGOHUD] [debug] Raw power usage (mW): 8
[2025-04-16 22:15:23.031] [MANGOHUD] [debug] Raw power usage (mW): 8
[2025-04-16 22:15:23.058] [MANGOHUD] [debug] Raw power usage (mW): 8
[2025-04-16 22:15:23.085] [MANGOHUD] [debug] Raw power usage (mW): 8
[2025-04-16 22:15:23.111] [MANGOHUD] [debug] Raw power usage (mW): 8

I guess this can be closed maybe? or leave it open until nvidia resolves it?.. i reported it here also: https://forums.developer.nvidia.com/t/570-release-feedback-discussion/321956/497?u=xpander

xpander69 avatar Apr 16 '25 19:04 xpander69

Let's leave it open until it's resolved by nvidia

flightlessmango avatar Apr 16 '25 20:04 flightlessmango