OhmGraphite icon indicating copy to clipboard operation
OhmGraphite copied to clipboard

Wrong GPU total memory reported

Open nitroxis opened this issue 1 year ago • 15 comments

Hi, I just noticed that OhmGraphite reports an incorrect total GPU memory size when there are multiple GPUs.

OhmGraphite's prometheus endpoint reports:

ohm_gpunvidia_bytes{hardware="NVIDIA GeForce GTX 1060 6GB",sensor="GPU Memory Total",hw_instance="0"} 11811160064
ohm_gpunvidia_bytes{hardware="NVIDIA GeForce RTX 2080 Ti",sensor="GPU Memory Total",hw_instance="1"} 11811160064

whereas the "1060 6GB" should have 6GB, as the name implies. It shows up correctly in LibreHardwareMonitor, so this does not appear to be the cause: image

nitroxis avatar May 20 '23 09:05 nitroxis

Thanks for the bug report! Couple of questions to help narrow in on the problem:

  • Is it just the memory total sensor that OhmGraphite reports as the same value?
  • What LibreHardwareMonitor version are you using?

nickbabcock avatar May 20 '23 11:05 nickbabcock

The screenshot was made with the current release version from their GitHub (v0.9.2). I've checked again - it is indeed all three GPU Memory ... metrics that are the same. Here is the full list of ohm_gpunvidia_bytes:

# HELP ohm_gpunvidia_bytes Metric reported by open hardware sensor
# TYPE ohm_gpunvidia_bytes gauge
ohm_gpunvidia_bytes{hardware="NVIDIA GeForce RTX 2080 Ti",sensor="GPU Memory Free",hw_instance="0"} 11092885504
ohm_gpunvidia_bytes{hardware="NVIDIA GeForce GTX 1060 6GB",sensor="D3D Shared Memory Used",hw_instance="1"} 155197440
ohm_gpunvidia_bytes{hardware="NVIDIA GeForce GTX 1060 6GB",sensor="GPU Memory Used",hw_instance="1"} 717225984
ohm_gpunvidia_bytes{hardware="NVIDIA GeForce GTX 1060 6GB",sensor="GPU Memory Total",hw_instance="1"} 11811160064
ohm_gpunvidia_bytes{hardware="NVIDIA GeForce RTX 2080 Ti",sensor="GPU Memory Total",hw_instance="0"} 11811160064
ohm_gpunvidia_bytes{hardware="NVIDIA GeForce RTX 2080 Ti",sensor="GPU Memory Used",hw_instance="0"} 717225984
ohm_gpunvidia_bytes{hardware="NVIDIA GeForce GTX 1060 6GB",sensor="GPU Memory Free",hw_instance="1"} 11092885504
ohm_gpunvidia_bytes{hardware="NVIDIA GeForce GTX 1060 6GB",sensor="D3D Dedicated Memory Used",hw_instance="1"} 1205383168
ohm_gpunvidia_bytes{hardware="NVIDIA GeForce RTX 2080 Ti",sensor="D3D Dedicated Memory Used",hw_instance="0"} 489439232
ohm_gpunvidia_bytes{hardware="NVIDIA GeForce RTX 2080 Ti",sensor="D3D Shared Memory Used",hw_instance="0"} 110133248

nitroxis avatar May 20 '23 11:05 nitroxis

The other ohm_gpunvidia_... metrics appear to be working correctly.

nitroxis avatar May 20 '23 12:05 nitroxis

One thing you can try is the nightly build of OhmGraphite built with LibreHardwareMonitor 0.9.2 (https://github.com/nickbabcock/OhmGraphite/suites/11729082221/artifacts/610719590)

~~If that doesn't fix things, are other sensors like load, wattage, and fans duplicated too?~~ Got it

nickbabcock avatar May 20 '23 12:05 nickbabcock

The nightly build still has this issue.

nitroxis avatar May 20 '23 12:05 nitroxis

Strange, if I compile it myself and launch it in the debugger, it works fine.

nitroxis avatar May 20 '23 12:05 nitroxis

Strange, if I compile it myself and launch it in the debugger, it works fine.

When you compile and run OhmGraphite yourself, it works!? 😨

That completely stumps me.

Copied below is a bit of an investigation that I went on, but if compiling it yourself works, then it can be ignored.


My best guess is that there's a difference in how LibreHardwareMonitor and OhmGraphite are refreshing sensors. OhmGraphite refreshes all hardware whenever it needs to send out new metrics. I can see that if LibreHardwareMonitor batches the refresh and UI update for each hardware component before going onto the next component, it would sidestep the possibility of a hardware sensors relying on a global value.

I feel like this is partially corroborated by the fact that it is only the memory sensors that use a display handle instead of a physical handle: https://github.com/LibreHardwareMonitor/LibreHardwareMonitor/blob/6066b1a79737bb7e23217f0d2bb1b14fab04b9aa/LibreHardwareMonitorLib/Hardware/Gpu/NvidiaGpu.cs#L967

nickbabcock avatar May 20 '23 14:05 nickbabcock

I wonder, if you execute:

dotnet publish -c Release .\OhmGraphite\

And run the resulting zip, if that'll also show the problem.

nickbabcock avatar May 22 '23 00:05 nickbabcock

I've looked into it a bit more and it appears that it is related to whether the program runs as a normal process or as a service. Running it with OhmGraphite.exe run yields correct results, running it as a service (e.g. OhmGraphite.exe start) yields the wrong results.

nitroxis avatar May 25 '23 14:05 nitroxis

Thanks for looking into it further. This issue looks like a variant of #153 (there are various possible solutions within that thread (like https://github.com/nickbabcock/OhmGraphite/issues/153#issuecomment-674433563), though the user ultimately went with the workaround in https://github.com/nickbabcock/OhmGraphite/issues/153#issuecomment-706311993). Their issue involved an AMD GPU, not Nvidia, yet seems eerily similar.

nickbabcock avatar May 26 '23 11:05 nickbabcock

It might be related, though it is strange that all other NVIDIA metrics appear to be working fine, it is only those 3 that are wrong. If it were some kind of permission/session thing, I would've thought either all metrics work, or none (like in the linked issue). Why only the memory metrics, and only for one GPU? Checking the "Interact with desktop" checkbox makes no difference for me. I don't really know how to investigate this further.

nitroxis avatar May 26 '23 16:05 nitroxis

Are these still problems that are persisting in 0.3x? (Issues are not closed)

What are the workarounds in that case?

roy-spark avatar Jul 17 '23 23:07 roy-spark

I changed to run OhmGraphihte from service to "OhmGraphite run" and finally it reported GPU load percentage. (It was constantly zero when running in service mode)

roy-spark avatar Aug 10 '23 10:08 roy-spark

Thanks for confirming. Looks like this issue is decently widespread. I'm not sure what causes the issue or what the fix is. Now that OhmGraphite recently started targeting .net 6, it looks like there is an easy and official way to create windows services that doesn't rely on a 3rd party library: https://learn.microsoft.com/en-us/dotnet/core/extensions/windows-service?pivots=dotnet-6-0

I might poke at it and see if it's viable and fixes issues.

nickbabcock avatar Aug 12 '23 22:08 nickbabcock

Since OhmGraphite v0.31, the old windows service library has been replaced with the newer, official microsoft implementation. Let me know if this fixes the situation.

nickbabcock avatar Jan 29 '24 12:01 nickbabcock