MangoHud icon indicating copy to clipboard operation
MangoHud copied to clipboard

MangoHud reports two GPUs on one GPU system

Open FoxieFlakey opened this issue 7 months ago • 18 comments

Do not report issue for old MangoHud versions

Describe the bug MangoHud reports two GPUs when there really no second GPU, only one GPU which is iGPU is actually present

List relevant hardware/software information

  • Linux Distribution: Alpine Linux edge
  • MangoHud version: v0.8.1
  • GPU: Intel Ultra High Definition 600
  • CPU: Intel Celeron N4000 (two cores)
  • Kernel: Linux 6.15-rc6

To Reproduce Steps to reproduce the behavior:

  1. Compile MangoHud from master branch at commit "amdgpu: always clear the TEMP_HOTSPOT throttling flag bit" (d297a57fc2c)
  2. Run MangoHud on glxgears
  3. Witness MangoHud hallucinating other non existent GPU

Expected behavior Only one GPU reported

Screenshots

Comparing v0.7.1 from distro and v0.8.1 from compiled showing two GPUs (incorrect) while older MangoHud reports one GPU (correct)

FoxieFlakey avatar May 24 '25 14:05 FoxieFlakey

Should be resolved here d1a7096

flightlessmango avatar May 29 '25 16:05 flightlessmango

@flightlessmango No, it doesn't work. Still two GPUs reported.

On other hand should I create new issue or keep it here? because my iGPU frequency also reported less accurate than previous version and iGPU power does not report at all (set 0.0 W regardless of load). The "second ghost GPU" statistics are all 0

FoxieFlakey avatar May 30 '25 02:05 FoxieFlakey

  1. Regarding two gpus, do ls /sys/class/drm

  2. power usage is not available because of lack of permissions on Intel integrated graphics (check out https://github.com/flightlessmango/MangoHud#metrics-support-by-gpu-vendordriver)

    it works in 0.7.1 because it uses intel_gpu_top which is launched with necessary permissions and since 0.8.0 mangohud is not using intel_gpu_top anymore, instead manually reading all the required information.

    the problem however arises, where some metrics are not available without root rights or certain capabilities[1].

    and mangohud can't gain neither root or any capabilities because it exists completely inside the game, so to give root rights or capabilities to mangohud, means you need to give it to the game, which is not possible

    so to gain back power usage for iGPU mangohud would need to either start using intel_gpu_top again or it needs to work as a separate process, which is not currently possible (it wasn't written this way)

[1] - capabilities is a fine-grained permission system which gives the ability to launch programs as normal user, but with some additional root rights without gaining full root access (https://wiki.archlinux.org/title/Capabilities)

17314642 avatar May 30 '25 04:05 17314642

  1. The result is
~ $ ls /sys/class/drm/
card0	    card0-HDMI-A-1  card1	renderD129
card0-DP-1  card0-eDP-1     renderD128	version
  1. About power usage, what permission it needs? because my iGPU did reports power usage as seen on older mangohud through using intel_gpu_top

And what about the clock? it rounded out to 100 Mhz steps while older one get it down to 1 Mhz step

FoxieFlakey avatar May 30 '25 06:05 FoxieFlakey

  1. There is your problem, you have two render devices: renderD128 and renderD129. Do ls -l /sys/class/drm/renderD*/device/driver

  2. intel_gpu_top needs either root or CAP_PERFMON to work. Mangohud doesn't use intel_gpu_top anymore

The reason why GPU frequency is rounded to nearest 100mhz is because Intel reports it that way. To get it down to 1mhz you have to use Intel's debugfs interface which requires root

17314642 avatar May 30 '25 09:05 17314642

  1. The output is
~ $ ls /sys/class/drm/
card0	    card0-HDMI-A-1  card1	renderD129
card0-DP-1  card0-eDP-1     renderD128	version
~ $ ls -l /sys/class/drm/renderD*/device/driver
lrwxrwxrwx 1 root root 0 May 30 17:40 /sys/class/drm/renderD128/device/driver -> ../../../bus/pci/drivers/i915
~ $
  1. So the 100 Mhz interval is the intended way by Intel? also the 1 Mhz need root part, partly wasnt true atleast on my system because intel_gpu_top works fine on me without root

FoxieFlakey avatar May 30 '25 10:05 FoxieFlakey

  1. do ls -l /sys/class/drm/renderD*/

  2. About 100mhz, I guess it is intended as to not give out too much info because intel linux devs consider a lot of things "information leak", hence requiting root everywhere (why they apply this approach only to linux and not windows is a mystery to me)

    Is your intel_gpu_top setcapped? Do getcap $(which intel_gpu_top)

17314642 avatar May 30 '25 16:05 17314642

Here the result for first and second

~/MyPullRequests/MangoHud $ ls -l /sys/class/drm/renderD*/
/sys/class/drm/renderD128/:
total 0
-r--r--r-- 1 root root 4096 May 31 09:46 dev
lrwxrwxrwx 1 root root    0 May 31 09:43 device -> ../../../0000:00:02.0
drwxr-xr-x 2 root root    0 May 31 09:46 power
lrwxrwxrwx 1 root root    0 May 31 09:43 subsystem -> ../../../../../class/drm
-rw-r--r-- 1 root root 4096 May 31 09:43 uevent

/sys/class/drm/renderD129/:
total 0
-r--r--r-- 1 root root 4096 May 31 09:46 dev
lrwxrwxrwx 1 root root    0 May 31 09:43 device -> ../../../vgem
drwxr-xr-x 2 root root    0 May 31 09:46 power
lrwxrwxrwx 1 root root    0 May 31 09:43 subsystem -> ../../../../../class/drm
-rw-r--r-- 1 root root 4096 May 31 09:43 uevent
~/MyPullRequests/MangoHud $ getcap "$(which intel_gpu_top)"
~/MyPullRequests/MangoHud $

FoxieFlakey avatar May 31 '25 02:05 FoxieFlakey

VGEM is the Virtual GEM provider and has been around for a while as a minimal non-hardware backed Graphics Execution Manager (GEM) memory management service. It's used by LLVMpipe and other non-native 3D driver scenarios for buffer sharing. VGEM is good for improved software rasterizer performance and has been part of the mainline kernel for the better part of a decade.

This is your second "gpu"

17314642 avatar May 31 '25 06:05 17314642

Do cat /proc/sys/kernel/perf_event_paranoid as I found out this also allows intel_gpu_top to run without root

17314642 avatar May 31 '25 06:05 17314642

Okay, so to run intel_gpu_top on Linux Mint without root, I need:

  1. sudo setcap cap_perfmon=+ep $(which intel_gpu_top)
  2. echo 3 | sudo tee /proc/sys/kernel/perf_event_paranoid

17314642 avatar May 31 '25 06:05 17314642

okay... I have

$ cat /proc/sys/kernel/perf_event_paranoid
0
$ 

FoxieFlakey avatar May 31 '25 14:05 FoxieFlakey

VGEM is the Virtual GEM provider and has been around for a while as a minimal non-hardware backed Graphics Execution Manager (GEM) memory management service. It's used by LLVMpipe and other non-native 3D driver scenarios for buffer sharing. VGEM is good for improved software rasterizer performance and has been part of the mainline kernel for the better part of a decade.

This is your second "gpu"

oh that makes sense... so mangohud now shows two GPU instead one is related to intel_gpu_top changes (because directly accesses the sysfs)?

FoxieFlakey avatar May 31 '25 14:05 FoxieFlakey

okay... I have

$ cat /proc/sys/kernel/perf_event_paranoid
0
$ 

In your case you have almost all access to all perf events, maybe that's why intel_gpu_top works without root for you

But it's not a default value for almost all distros

17314642 avatar May 31 '25 14:05 17314642

Regarding second "gpu", I'll add a check to skip any renderD* device that mmangohud doesn't support

17314642 avatar May 31 '25 14:05 17314642

In your case you have almost all access to all perf events, maybe that's why intel_gpu_top works without root for you

But it's not a default value for almost all distros

that might explain it, I set it to 0 so I can use 'perf record' to profile my projects unprivileged and didn't expect that it relates to i915 GPU stuffs

FoxieFlakey avatar May 31 '25 15:05 FoxieFlakey

Actually there is two ways to launch intel_gpu_top without root:

  1. sudo setcap cap_perfmon=+ep $(which intel_gpu_top) echo 3 | sudo tee /proc/sys/kernel/perf_event_paranoid

  2. echo 0 | sudo tee /proc/sys/kernel/perf_event_paranoid

17314642 avatar May 31 '25 19:05 17314642

okie!

FoxieFlakey avatar Jun 01 '25 02:06 FoxieFlakey