MangoHud
MangoHud copied to clipboard
GPU Usage stuck at 0-1%
When launching an application, the GPU usage will briefly spike up to >100%, then get stuck at 0-1% when other reporting applications report a higher percentage.
This issue does not reproduce on MangoHud 0.6.6.
Using Vulkan Examples' pbribl
example:
radeontop -c
:
nvtop
:
sudo umr -O use_color -t
:
hexdump -C /sys/class/drm/card*/device/gpu_metrics
:
00000000 80 00 02 02 b5 18 57 17 12 16 89 17 5d 16 25 17 |......W.....].%.|
00000010 12 16 bb 17 12 16 d4 17 ed 17 8f 16 47 22 00 00 |............G"..|
00000020 70 d2 24 1e d1 03 00 00 12 00 64 34 43 03 ff ff |p.$.......d4C...|
00000030 87 00 21 00 01 04 07 00 20 00 fc 00 ef 00 63 00 |..!..... .....c.|
00000040 d1 05 bd 01 ff ff 20 03 90 01 ff ff cf 05 bd 01 |...... .........|
00000050 06 00 20 03 90 01 90 01 78 05 78 05 78 05 78 05 |.. .....x.x.x.x.|
00000060 78 05 78 05 78 05 78 05 78 05 78 05 06 00 00 00 |x.x.x.x.x.x.....|
00000070 00 00 ff ff ff ff ff ff 60 00 00 00 00 00 00 00 |........`.......|
00000080
hexdump -C /sys/class/drm/card0/device/gpu_busy_percent
:
00000000 38 37 0a |87.|
00000003
Side-by-Side:
System Information:
- CPU: 16x AMD Ryzen 7 PRO 4750U with Radeon Graphics
- RAM: 32GB 3200MHz DDR4
- GPU: AMD RENOIR (LLVM 13.0.1, DRM 3.44, 5.17.5-arch1-1) / AMD RADV RENOIR
- Kernel Version: 5.17.5-arch1-1
- Driver: Mesa 22.0.2
- MangoHud Version: 0.6.7
I also ran a git bisect
between 5349226
and v0.6.7
. Logs are attached below:
git bisect start
# good: [5349226fa50f98c7d3328258112f48865b96cddb] amdgpu: average load over .5s
git bisect good 5349226fa50f98c7d3328258112f48865b96cddb
# bad: [663bbd05a60c7d1e3fd352fdd8c55e96bd8af0f2] Bump to 0.6.7
git bisect bad 663bbd05a60c7d1e3fd352fdd8c55e96bd8af0f2
# bad: [f9cfdeb0804779a9957bcf956b9dfc63956a23b4] Add gpu throttling status
git bisect bad f9cfdeb0804779a9957bcf956b9dfc63956a23b4
# good: [350dca5d2196c166d090fc783a6d8da607fe789e] Dynamic width when fps_only
git bisect good 350dca5d2196c166d090fc783a6d8da607fe789e
# bad: [ae85730448f3ac7c895e5669f48aab032abb3040] Improve amdgpu polling
git bisect bad ae85730448f3ac7c895e5669f48aab032abb3040
# first bad commit: [ae85730448f3ac7c895e5669f48aab032abb3040] Improve amdgpu polling
(I started with 5349226 because previous commits were broken by #731)
Can you apply this patch to latest and post the terminal output here?
index 911d931..c7535b0 100644
--- a/src/amdgpu.cpp
+++ b/src/amdgpu.cpp
@@ -148,6 +148,7 @@ void amdgpu_metrics_polling_thread() {
// Detect and fix if the gpu load is reported in centipercent
if (gpu_load_needs_dividing || metrics_buffer[cur_sample_id].gpu_load_percent > 100){
+ printf("AMDGPU load assuming centipercent because we recieved: %i\n", metrics_buffer[cur_sample_id].gpu_load_percent);
gpu_load_needs_dividing = true;
metrics_buffer[cur_sample_id].gpu_load_percent /= 100;
}
I believe this patch should fix the issue, can you confirm?
index 911d931..f2f035f 100644
--- a/src/amdgpu.cpp
+++ b/src/amdgpu.cpp
@@ -16,8 +16,8 @@ std::string metrics_path = "";
*/
struct amdgpu_common_metrics {
/* Load level: averaged across the sampling period */
- uint8_t gpu_load_percent;
- // uint8_t mem_load_percent;
+ uint16_t gpu_load_percent;
+ // uint16_t mem_load_percent;
/* Power usage: averaged across the sampling period */
float average_gfx_power_w;
can confirm this is happening to me
I believe this patch should fix the issue, can you confirm?
index 911d931..f2f035f 100644 --- a/src/amdgpu.cpp +++ b/src/amdgpu.cpp @@ -16,8 +16,8 @@ std::string metrics_path = ""; */ struct amdgpu_common_metrics { /* Load level: averaged across the sampling period */ - uint8_t gpu_load_percent; - // uint8_t mem_load_percent; + uint16_t gpu_load_percent; + // uint16_t mem_load_percent; /* Power usage: averaged across the sampling period */ float average_gfx_power_w;
Yes, this patch indeed fixes the issue.
Sidenote: I still observe that when first launching the application, the GPU load percentage will go from 0 -> 3000 -> 0 -> 50 -> 75 -> ... until it reaches the actual GPU load percentage (~87% for the demo). Should I file another bug report for that issue?
I've pushed a new commit that should address that issue as well, can you confirm?
~~Yes, I can confirm that this issue is fixed. Thanks!~~
Edit: I realized I tested this on a system which doesn't have this issue, so I will need a bit more time to try it again. Sorry!
ok I just tried it, and it works fine I didn't extensively test it, just loaded into the rocket league main screen and saw gpu usage at around 100% with the new patch, vs 0% with the old one
I've pushed a new commit that should address that issue as well, can you confirm?
I can confirm that the incorrect GPU load and initial GPU load spike issue is fixed. Thanks!
Sidenote (again): I see that MangoHud takes around 5-8 (increasing) measurements before it will stabilize at the current GPU load. Is that normal?
one measurement is every 0.5 seconds? I just see one incorrect value and then it's correct
What I mean is, it will gradually ramp up like: 0, 49, 58, 72, 79, 83, 86, 87, 88
before it stays at ~88% utilization. In other word, it appears to take around 3 seconds while the GPU utilization metric ramps up to the expected percentage.