MangoHud icon indicating copy to clipboard operation
MangoHud copied to clipboard

Use libdrm_amdgpu as an alternative GPU load information source

Open bsolos opened this issue 2 years ago • 18 comments

Use libdrm_amdgpu to calculate the GPU load. This should resolve #923.

The GPU load calculation method was inspired by radeontop, but no code in this PR was copied from there.

bsolos avatar Feb 15 '23 13:02 bsolos

It works properly on my machine, and I don't have any other Vega cards to test it on. Actually, it should also work on most non-Vega AMD GPUs

bsolos avatar Feb 18 '23 19:02 bsolos

https://github.com/flightlessmango/MangoHud/pull/703

Also you'd need drm master authorization, with secondary GPUs at least.

jackun avatar Mar 04 '23 19:03 jackun

Also you'd need drm master authorization, with secondary GPUs at least.

This should not be needed, if one uses the renderD node - seems like currently the card one is being used.

If the card node is already opened you can the fd with drmGetRenderDeviceNameFromFd(). Alternatively drmGetDevices2() gives you all devices, find the needed one by matching the card node and use the render node. This MR does something different but should give you a good starting point.

evelikov avatar Mar 04 '23 20:03 evelikov

@bsolos overall I would encourage you to open a bug at the AMDGPU gitlab, clearly describe the issue (exact kernel, version, distro, use-case, etc) + CC Alex Deucher aka @agd5f

This is a great workaround, but the upstream gpu metrics should really be fixed.

evelikov avatar Mar 04 '23 20:03 evelikov

I didn't open an issue there because https://gitlab.freedesktop.org/drm/amd/-/issues/1932 is already open. It seemed like there was no progress in the last 10 months, so I thought that this workaround might be beneficial to MangoHud. Should I still open a new issue?

bsolos avatar Mar 06 '23 14:03 bsolos

One should not assume that devs don't care about issues, just because there's no update. Sometimes they have higher/other priorities, sometimes it fall through the cracks.

By opening/prodding you'll increase visibility and raise severity. If you can test kernel patches, it's more likely that devs will try to get fixed faster. Sitting quietly does not help, I'm afraid.

evelikov avatar Mar 06 '23 14:03 evelikov

Wouldn't something like: https://www.kernel.org/doc/html/latest/gpu/drm-usage-stats.html make more sense then polling hardware registers? Plus it's cross-vendor.

Alex

On Mon, Mar 6, 2023 at 12:05 PM bsolos @.***> wrote:

@.**** commented on this pull request.

In src/amdgpu_libdrm.cpp https://github.com/flightlessmango/MangoHud/pull/925#discussion_r1126768532 :

@@ -51,6 +52,13 @@ static int libdrm_initialize() { return -1; }

  • char *renderD = drmGetRenderDeviceNameFromFd(fd);
  • fd = open(renderD, O_RDWR);

Sorry, I've never really worked with libdrm before. Will fix shortly

— Reply to this email directly, view it on GitHub https://github.com/flightlessmango/MangoHud/pull/925#discussion_r1126768532, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVKS5D2PTIWVGQAQJKITL43W2YKPDANCNFSM6AAAAAAU42PTOE . You are receiving this because you were mentioned.Message ID: @.***>

agd5f avatar Mar 06 '23 17:03 agd5f

@bsolos overall I would encourage you to open a bug at the AMDGPU gitlab, clearly describe the issue (exact kernel, version, distro, use-case, etc) + CC Alex Deucher aka @agd5f

This is a great workaround, but the upstream gpu metrics should really be fixed.

Seems like the load sensor isn't supported on the hardware level

bsolos avatar Mar 06 '23 19:03 bsolos

The other problem with polling registers is that it keeps the GPU awake using more power. The driver has to disable gfxoff when you read back registers.

Alex

On Mon, Mar 6, 2023 at 12:55 PM Alex Deucher @.***> wrote:

Wouldn't something like: https://www.kernel.org/doc/html/latest/gpu/drm-usage-stats.html make more sense then polling hardware registers? Plus it's cross-vendor.

Alex

On Mon, Mar 6, 2023 at 12:05 PM bsolos @.***> wrote:

@.**** commented on this pull request.

In src/amdgpu_libdrm.cpp https://github.com/flightlessmango/MangoHud/pull/925#discussion_r1126768532 :

@@ -51,6 +52,13 @@ static int libdrm_initialize() { return -1; }

  • char *renderD = drmGetRenderDeviceNameFromFd(fd);
  • fd = open(renderD, O_RDWR);

Sorry, I've never really worked with libdrm before. Will fix shortly

— Reply to this email directly, view it on GitHub https://github.com/flightlessmango/MangoHud/pull/925#discussion_r1126768532, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVKS5D2PTIWVGQAQJKITL43W2YKPDANCNFSM6AAAAAAU42PTOE . You are receiving this because you were mentioned.Message ID: @.***>

agd5f avatar Mar 07 '23 13:03 agd5f

drmGetStats doesnt seem to work on the renderD node, and using it on the primary node always gives stats.count=0. Maybe I have to authenticate first?

bsolos avatar Mar 07 '23 14:03 bsolos

@agd5f perhaps a in-tree kernel AMDGPU doc outlining the preferred options and their caveats will be great. Something people can keep an eye on, as things evolve - say team introduces new method do fetch X, or method Y has issues (aka gfxoff issue mentioned), approach Z might be deprecated (ETA, reason), etc.

evelikov avatar Mar 07 '23 14:03 evelikov

@bsolos drmGetStats is legacy API and should not be used. As the in-kernel comment says "getstats is defunct, just clear"

evelikov avatar Mar 07 '23 14:03 evelikov

This makes sense now. It seems like finding what is the correct way is much more difficult than I thought

I use the register-polling approach because that's what radeontop does, and it works

bsolos avatar Mar 07 '23 14:03 bsolos

For reference on how to use the fdinfo interface see: https://www.spinics.net/lists/intel-gfx/msg294401.html

On Tue, Mar 7, 2023 at 9:39 AM bsolos @.***> wrote:

This makes sense now. It seems like finding what is the correct way is much more difficult than I thought

— Reply to this email directly, view it on GitHub https://github.com/flightlessmango/MangoHud/pull/925#issuecomment-1458290024, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVKS5D4NGX6GOY7LJECQU3TW25CD5ANCNFSM6AAAAAAU42PTOE . You are receiving this because you were mentioned.Message ID: @.***>

agd5f avatar Mar 07 '23 17:03 agd5f

@bsolos the site/link is down see https://patchwork.freedesktop.org/series/102175/

@agd5f does that interface provide system-wise statistics? it seems to be per-client and per-fd, where mangohud exposes the total system data. Technically one could iterating over /proc/foo/fdinfo for the total, assuming they have permissions - yet mangohud should not be run as root.

evelikov avatar Mar 07 '23 21:03 evelikov

Yes, it's per client. Similar to top for the CPU.

Alex

On Tue, Mar 7, 2023 at 4:13 PM Emil Velikov @.***> wrote:

@bsolos https://github.com/bsolos the site/link is down see https://patchwork.freedesktop.org/series/102175/

@agd5f https://github.com/agd5f does that interface provide system-wise statistics? it seems to be per-client and per-fd, where mangohud exposes the total system data. Technically one could iterating over /proc/foo/fdinfo for the total, assuming they have permissions - yet mangohud should not be run as root.

— Reply to this email directly, view it on GitHub https://github.com/flightlessmango/MangoHud/pull/925#issuecomment-1458882467, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVKS5D4K2XUZJ3ATIBPG3KTW26QHHANCNFSM6AAAAAAU42PTOE . You are receiving this because you were mentioned.Message ID: @.***>

agd5f avatar Mar 08 '23 14:03 agd5f

~~Hello.~~ ~~I am developing amdgpu_top.~~
~~amdgpu_top has simple fdinfo parser and performance counters (GRBM, GRBM2, CP_STAT) readings and sensor readings implemented.~~
~~Would it help you?~~

Umio-Yasuno avatar Apr 03 '23 19:04 Umio-Yasuno

~~working branch: https://github.com/Umio-Yasuno/amdgpu_top/tree/json-output~~

Umio-Yasuno avatar Apr 03 '23 22:04 Umio-Yasuno