firmware icon indicating copy to clipboard operation
firmware copied to clipboard

HDMI screen turns off when sustained high CPU usage on all cores

Open vanfanel opened this issue 2 years ago • 56 comments

Describe the bug When CPU cores are under heavy usage (4 cores at ~100%, as with building software using make -j4) for a certain time, HDMI is turned off. The process doing intensive CPU usage continues and completes without problems.

To reproduce -Update to latest firmware with rpi-update -Try to build a big C/C++ project via make -j4

Expected behaviour Simply build the thing.

Actual behaviour HDMI is turned off. The high CPU process is finished without problems.

System Raspberry Pi 4b+ v1.2, 2GB RAM,

OS version:

pi@raspberrypi:~/src/Commander-Genius/b4 $ cat /etc/rpi-issue 
Raspberry Pi reference 2020-05-27
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 30e2dd32ba47cc3bec15ab1413c16a17e5797775, stage4

Firmware version:

pi@raspberrypi:~ $ vcgencmd version               
Jul 14 2021 14:20:55 
Copyright (c) 2012 Broadcom
version 1ecd7d49359f3b48737f1a9e33c2f1513f90743d (clean) (release) (start)

Kernel version:

pi@raspberrypi:~ $ uname -a
Linux raspberrypi 5.10.49-v8+ #1436 SMP PREEMPT Wed Jul 14 14:20:10 BST 2021 aarch64 GNU/Linux

Additional context It started to happen after updating from 5.10.17 or so. Didn't happen before.

Also, I use the vc4-hdmi audio device, so in config.txt I have commented out the BCM audio module:


dtoverlay=vc4-kms-v3d
#dtparam=audio=on

vanfanel avatar Jul 15 '21 23:07 vanfanel

Exact rpi-update version when this started would be useful. Are you using fkms or kms driver? Are you saying hdmi output returns when make completes?

popcornmix avatar Jul 16 '21 14:07 popcornmix

@popcornmix Sadly I can't say when this started because I had been months without updating. Any revision you suspect and I can "force"? (Along with how to force it, which I have never done)

I am using KMS.

HDMI does not come back until I reboot.

vanfanel avatar Jul 16 '21 15:07 vanfanel

It's not an issue I've seen myself or seen reported, so can't really guess. Does dmesg have any errors after this?

popcornmix avatar Jul 16 '21 15:07 popcornmix

What resolution/refresh rate is hdmi monitor? Does a lower one still have the issue? Does force_turbo=1 help?

popcornmix avatar Jul 16 '21 15:07 popcornmix

@popcornmix I cleared dmesg, then ran compilation on all four cores, then just after the monitor goes off, I get these on dmesg:


pi@raspberrypi:~ $ dmesg
[   93.150825] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:76:crtc-3] flip_done timed out
[  103.390806] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:76:crtc-3] flip_done timed out
[  113.630833] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CONNECTOR:32:HDMI-A-1] flip_done timed out

About resolution and refresh rate, I am using this mode in config.txt to force the mode of my choice:

hdmi_group=2
hdmi_mode=39

force_turbo=1 has no effect.

config_hdmi_boost=4 has no effect.

Setting a "secure" video mode like this:

hdmi_group=2
hdmi_mode=4

...has no effect.

vanfanel avatar Jul 18 '21 21:07 vanfanel

About resolution and refresh rate, I am using this mode in config.txt to force the mode of my choice:

config.txt settings only affect the initial simple framebuffer mode before kms takes over. You should be using standard linux methods for configuring the hdmi mode (e.g. arandr if using X or a "video=" cmdline.txt setting.

What is the resolution/refresh rate you are actually using before you get the "flip_done timeout"? What does "vcgencmd get_throttled" report after the "flip_done timeout"?

popcornmix avatar Jul 19 '21 09:07 popcornmix

@popcornmix

For these experiments, I have added this to commandline.txt

video=HDMI-A-1:640x480@60

And I am also setting this in config.txt as I said:

hdmi_group=2
hdmi_mode=4

So I am, in fact, using the basic 640x480 at 60Hz video mode. That doesn't change anything (except the console resolution, of course!)

As for the command you asked, this is what it says:

pi@raspberrypi:~/src/sm64ex-alo $ vcgencmd get_throttled
throttled=0xe0000

vanfanel avatar Jul 19 '21 10:07 vanfanel

So I am, in fact, using the basic 640x480 at 60Hz video mode. That doesn't change anything (except the console resolution, of course!)

I'm still not clear if you are saying the "flip done timeout" is occurring at 640x480@60Hz or a higher resolution. The hdmi mode can be set in many places: initially by firmware (affected by hdmi_group/hdmi mode in config.txt) by kernel when creating the console for kms/fkms (affected by video=<> in cmdline.txt) by user code like X (e.g. setting the last mode configured with arandr). by other applications that can do modesetting (e.g. kodi).

what is the resolution you are running at when you get "flip done timeout" and what were you running? (e.g. X etc)

popcornmix avatar Jul 19 '21 13:07 popcornmix

@popcornmix I usually run my Pi at 1360x768, but when I am running what you ask me to run, etc... then I move to basic 640x480@60Hz. There's nothing wrong with the video mode in use, and it has no impact on the issue, because I have tried a lot of different video modes, from basic 640x480@60Hz to 1080p, and the issue is the same.

So, simply put, I am using 640x480@60 when I report anything on this thread.

I don't have an X server. TTY console, of course, runs on legacy fbdev, just like in every GNU/Linux system as far as I know. When I do compilation on all four cores, there's NOTHING running except the fbdev TTY console.

So, to be clear: I have no Xorg server. I set the video mode in config.txt or in cmdline.txt. Since you told me to use the "video=..." directive in cmline.txt, that's what I use. That's all.

vanfanel avatar Jul 19 '21 14:07 vanfanel

Does the "flip done" message always correspond to vcgencmd get_throttled returning a non-zero value (i.e. throttling occurring)?

popcornmix avatar Jul 19 '21 16:07 popcornmix

@popcornmix What happens is this.

On idle system, or when running, let's say, SDLPop, Scummvm, etc... I normally get this:

pi@raspberrypi:~/src/Raze/b4 $ vcgencmd get_throttled
throttled=0x0

But after a couple of seconds of building anything with -j3, -j4, etc.. screen goes off, and I get this:

pi@raspberrypi:~ $ vcgencmd get_throttled
throttled=0x0
pi@raspberrypi:~ $ vcgencmd get_throttled
throttled=0x80008
pi@raspberrypi:~ $ vcgencmd get_throttled
throttled=0xe0008
pi@raspberrypi:~ $ vcgencmd get_throttled
throttled=0xe0006
pi@raspberrypi:~ $ vcgencmd get_throttled
throttled=0xe0008
pi@raspberrypi:~ $ vcgencmd get_throttled
throttled=0xe0008
pi@raspberrypi:~ $ vcgencmd get_throttled
throttled=0xe0006
pi@raspberrypi:~ $ vcgencmd get_throttled
throttled=0xe0008
pi@raspberrypi:~ $ vcgencmd get_throttled
throttled=0xe0008
pi@raspberrypi:~ $ vcgencmd get_throttled
throttled=0xe0006
pi@raspberrypi:~ $ vcgencmd get_throttled

As you can see, initially it returns zero, and then non-zero values as monitor goes off.

It's always the same.

vanfanel avatar Jul 19 '21 19:07 vanfanel

@popcornmix I went back from kernel 5.10.50 to to stable 5.10.17 (remember I am on a full aarch64 Raspberry Pi OS) via: sudo apt-get install --reinstall raspberrypi-bootloader raspberrypi-kernel

...and after rebooting, I can do whatever I want on all the CPU cores: no more HDMI display turn-off.

Also, strange lock-ups I had reported on different opensource game engines have simply dissapeared after going back to stable! https://github.com/lethal-guitar/RigelEngine/issues/662 https://gitlab.com/Dringgstein/Commander-Genius/-/issues/491

vanfanel avatar Jul 22 '21 18:07 vanfanel

@popcornmix Another thing to note is that I always use the vc4-hdmi device (ARM-side ALSA driver). I have added that information to the first post.

vanfanel avatar Jul 24 '21 13:07 vanfanel

For devs trying to reproduce this issue locally on Pi OS 64bits: simply build a large project in C++ (not plain C) using make -j4 In case you still can't see it happening, do it as root or remove your user's rlimits so you can really cause a 100% CPU usage.

The HDMI display will be turned off, that's for sure. It happens with every monitor I use. Official cable & power supply here, btw.

Now 5.10.52 is the "stable" kernel, so it happens again on my system after doing a simple sudo apt-get update && upgrade.

vanfanel avatar Aug 12 '21 20:08 vanfanel

@vanfanel can you give this test firmware a try? I think the issue is when arm throttles (due to high temperature) it was incorrectly reducing core frequency below that required for the hdmi mode.

popcornmix avatar Aug 13 '21 12:08 popcornmix

@popcornmix Tested, but I am sorry to say that it's still happening. Any other experimental firmwares you want me to try, I'll be glad to do so.

vanfanel avatar Aug 14 '21 12:08 vanfanel

Can you report output of vcgencmd version when using the test firmware? Can you confirm that display is fine when you have throttled=0x0 or throttled=0x80008, but occurs when any addional bits are set?

popcornmix avatar Aug 16 '21 09:08 popcornmix

@popcornmix:


pi@raspberrypi:~ $ vcgencmd version
Aug 13 2021 13:03:32 
Copyright (c) 2012 Broadcom
version 5ffbdf498f77137ac0fbb2f63214eeb3346a3969 (tainted) (release) (start)

Also, when display is fine (on an idle system), I always get:

pi@raspberrypi:~ $ vcgencmd get_throttled
throttled=0x0

Then after 1 minute building a C++ project with ninja -j4 or make -j4 I start seeing:

pi@raspberrypi:~ $ vcgencmd get_throttled
throttled=0x80000

Then later I see:

pi@raspberrypi:~ $ vcgencmd get_throttled
throttled=0x80008

At this point, the display is turned off And then, just when display is turned off I see this:

pi@raspberrypi:~ $ vcgencmd get_throttled
throttled=0xe0006

And then, while the display is still off, I see:

pi@raspberrypi:~ $ vcgencmd get_throttled
throttled=0xe0008

...from this point (remember: display is off and won't come back even if the compilation finishes OK), these two 0xe0006 and 0xe0008 alternate. Display is off until I reboot.

vanfanel avatar Aug 16 '21 20:08 vanfanel

This is very surprising, as I can reproduce your description easily (you can even set temp_limit=65 to make it happen more quickly).

With default firmware if I'm in a 4kp60 mode and I hit the temperature limit (signalled by THROTTLED_HIGH_TEMP=2 and THROTTLED_LIMIT_TURBO=4) then core freq gets lowered to 200MHz which isn't enough to sustain 4kp60 and we lose (permanently) display output.

However with the test firmware we no longer limit the core frequency and this issue doesn't occur.

I've just reproduced this on a different Pi4 and it still fixes it.

Can you post your config.txt and cmdline.txt in case they are having an effect. Also report vcgencmd measure_clock core before and after the hdmi output is lost.

popcornmix avatar Aug 17 '21 15:08 popcornmix

1 - Before display goes off:


pi@raspberrypi:~ $ vcgencmd measure_clock core
frequency(1)=500000992
pi@raspberrypi:~ $ vcgencmd measure_clock core
frequency(1)=500000992
pi@raspberrypi:~ $ vcgencmd measure_clock core
frequency(1)=500000992
pi@raspberrypi:~ $ vcgencmd measure_clock core
frequency(1)=500000992
pi@raspberrypi:~ $ vcgencmd measure_clock core
frequency(1)=500000992

(Value is stable until display goes off)

2 - After display goes off:

pi@raspberrypi:~ $ vcgencmd measure_clock core
frequency(1)=333333984
pi@raspberrypi:~ $ vcgencmd measure_clock core
frequency(1)=199995120
pi@raspberrypi:~ $ vcgencmd measure_clock core
frequency(1)=333333984
pi@raspberrypi:~ $ vcgencmd measure_clock core
frequency(1)=333333984
pi@raspberrypi:~ $ vcgencmd measure_clock core
frequency(1)=333333984
pi@raspberrypi:~ $ vcgencmd measure_clock core
frequency(1)=333333984
pi@raspberrypi:~ $ 
pi@raspberrypi:~ $ vcgencmd measure_clock core
frequency(1)=333333984
pi@raspberrypi:~ $ vcgencmd measure_clock core
frequency(1)=333333984
pi@raspberrypi:~ $ vcgencmd measure_clock core
frequency(1)=499987808
pi@raspberrypi:~ $ vcgencmd measure_clock core
frequency(1)=500000992
pi@raspberrypi:~ $ vcgencmd measure_clock core
frequency(1)=500000992
pi@raspberrypi:~ $ vcgencmd measure_clock core
frequency(1)=500000992
pi@raspberrypi:~ $ vcgencmd measure_clock core
frequency(1)=500000992
pi@raspberrypi:~ $ vcgencmd measure_clock core
frequency(1)=500000992
pi@raspberrypi:~ $ vcgencmd measure_clock core
frequency(1)=333333984
pi@raspberrypi:~ $ vcgencmd measure_clock core
frequency(1)=200008304
pi@raspberrypi:~ $ vcgencmd measure_clock core
frequency(1)=333333984
pi@raspberrypi:~ $ vcgencmd measure_clock core
frequency(1)=333333984

Now, this is my config.txt config.txt

and this is my cmdline.txt cmdline.txt

You will see I am using an slight overclock, but it goes with the corresponding overvoltage. Using official HDMI cable and power supply.

vanfanel avatar Aug 17 '21 21:08 vanfanel

Okay, it's

hdmi_group=2
hdmi_mode=39 #  1360x768  60Hz

that stops it working. If you remove that I suspect the issue will be resolved. Note that hdmi_mode/hdmi_group (and pretty much all hdmi_ settings) don't work with the kms driver (which is driven by settings on the arm side). But I'll try to find out why they don't play well with kms and the temperature limit.

popcornmix avatar Aug 18 '21 13:08 popcornmix

@popcornmix I have removed every hdmi_* setting and I am still seeing the issue. It must be annoying, sorry.

vanfanel avatar Aug 18 '21 15:08 vanfanel

It does seem there are two similar issues here. The first involves core_freq being set too low when throttling. That is easy to reproduce. Use a high clock rate hdmi mode (e.g. 4kp60) and throttle. That is fixed with test firmware (and now rpi-update).

The second seems to be the M2MC clock (aka hsm clock in kernel driver) being set to 0. It is noticed when clocks change after throttling, but the problem seems to occur earlier, and it seems to occur with certain hdmi resolutions (possibly your edid gives the same hdmi mode without the hdmi_ settings).

popcornmix avatar Aug 18 '21 17:08 popcornmix

@popcornmix You are right, it's only happening with the 1360x758 video mode. I have tried other resolutions via the "video=..." kernel parameter, and the issue doesn't show it's ugly face.

vanfanel avatar Aug 18 '21 17:08 vanfanel

This thread is worth a read for the 1366x768 mode. https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=284866

JamesH65 avatar Aug 18 '21 19:08 JamesH65

@JamesH65 it's 1360x768, and the display is fine up until the first throttle, so I don't think that thread is relevant.

popcornmix avatar Aug 18 '21 19:08 popcornmix

@vanfanel can you try this test firmware

popcornmix avatar Aug 18 '21 20:08 popcornmix

@popcornmix Tried! The issue at 1360x768 (forced in cmdline.txt via the video=... parameter) is not appearing anymore. So, if you have no objections, this could be closed.

Thanks a lot for looking into this, for the time you invested into fixing the issue (and discovering it, to begin with... happening with a video mode only was unexpected).

vanfanel avatar Aug 19 '21 09:08 vanfanel

Fix should be in latest rpi-update firmware.

popcornmix avatar Aug 19 '21 11:08 popcornmix

@popcornmix Same HDMI poweroff on throttling is happening again in kernel 5.10.76-v8 with video="HDMI-A-1:1280x720@60"

vanfanel avatar Nov 05 '21 07:11 vanfanel