userland
userland copied to clipboard
EGL Context creations hangs after VideoCore crash
I have seen similar issues like this - but they're quite old and maybe not related to the same bug. I've compiled SDL2 according to https://solarianprogrammer.com/2015/01/22/raspberry-pi-raspbian-getting-started-sdl-2/.
At some point the display freezes up and have to be released by SIGKILL signal to the app. After this- all applications which tries to create an EGL context won't start at all. The only way to recover that I have found is to reboot the system.
This has happened under both an RPi2 with newest Raspbian distro Linux rpi2 4.1.6-v7+ #810 SMP PREEMPT Tue Aug 18 15:32:12 BST 2015 armv7l GNU/Linux and also version 1 B with the RetroPie distro Linux retropie 3.18.11+ #781 PREEMPT Tue Apr 21 18:02:18 BST 2015 armv6l GNU/Linux.
I tested a very simple program found here: http://pastebin.com/Vnje5sEe which is using the PI GL API directly (not SDL2), and from what I can see the function call eglCreateContext never returns.
I do not have any exact steps to re-create this error yet - but the fact that some calls never return should nevertheless never happen in my opinion.
The problem isn't that eglCreateContext doesn't return - it sounds like the gpu has crashed. I suspect that video playback (e.g. hello_video) and quite possibly vcgencmd will also be failing at this point.
It might be worth setting start_debug=1 in config.txt and after the crash running:
sudo vcdbg log msg
sudo vcdbg log assert
sudo vcdbg malloc
sudo vcdbg reloc
Ideally run vcgencmd cache_flush before the malloc/reloc commands, although that command may fail depending on how crashed the gpu is.
Really you need to provide a test app that I can run that provokes the gpu crash. That way I can get the gpu debugger connected and see what the problem is.
Just stating the obvious, but if you are having any stability issues, then disable overclocking before running any tests.
I have seen similar issues like this , after the program is freeze , and I kill it , it can not run it again , only reboot can solve this problem. I have little program with source can repeat this problem. I post it in this link https://www.raspberrypi.org/forums/viewtopic.php?f=67&t=121267
But no one have any comment.
@bluefishisme did you ever figure out how to fix your issue? The reason I ask is because I'm experiencing the exact same symptoms you are in that even vgencmd is freezing after openvg calls occur:
ioctl(3, 0xc01cc402
Appears to hang there and all subsequent openvg calls fail.
Also, $ sudo vcdbg log msg shows: 412414.170: vcos_abort: Halting
@ykram Do you have an application I can run on raspbian that provokes the vcos_abort? I could at least then determine the backtrace that resulted in that.
I can upload the source being used that triggers the issue although it depends on the OpenVG wrapper (ajstarks/openvg repo) and also requires input as it uses IPC to dictate how things get drawn but I can provide a dummy app that can send data so it'd work. What's the best way to get those to you?
On Mon, May 23, 2016 at 9:20 AM, popcornmix [email protected] wrote:
@ykram https://github.com/ykram Do you have an application I can run on raspbian that provokes the vcos_abort? I could at least then determine the backtrace that resulted in that.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/raspberrypi/userland/issues/254#issuecomment-221008628
Just zip/tar up the files I need to run and give me a link (e.g. to dropbox/google drive). I don't need the source just something that when run provokes a vcos_abort.
I'll try to get this archived and sent to you today. I have to recompile some network specific parts to make it so that you'll be able to reproduce sending/receiving data that the OpenVG calls depend on.
On Mon, May 23, 2016 at 9:44 AM, popcornmix [email protected] wrote:
Just zip/tar up the files I need to run and give me a link (e.g. to dropbox/google drive). I don't need the source just something that when run provokes a vcos_abort.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/raspberrypi/userland/issues/254#issuecomment-221014288
So I've been trying to reproduce this using an application that reads network data and replays it back to the server so that the OpenVG client can read the data and interpret it + display it but I can't get it to crash this way. If I use the application as intended however then it'll crash randomly (vcos_abort()). Is there anyway I can generate a stacktrace/coredump and get you those files to debug?
On Mon, May 23, 2016 at 12:16 PM, Mark M [email protected] wrote:
I'll try to get this archived and sent to you today. I have to recompile some network specific parts to make it so that you'll be able to reproduce sending/receiving data that the OpenVG calls depend on.
On Mon, May 23, 2016 at 9:44 AM, popcornmix [email protected] wrote:
Just zip/tar up the files I need to run and give me a link (e.g. to dropbox/google drive). I don't need the source just something that when run provokes a vcos_abort.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/raspberrypi/userland/issues/254#issuecomment-221014288
It's the gpu that is calling vcos_abort, so arm stacktrace/coredump won't help. It's not possible to capture a gpu stacktrace/coredump.
Ah, bummer. I'll work on creating a POC that reproduces the issue and will reply back here as soon as I get something created that I can use to reliably reproduce the bug.
On Wed, May 25, 2016 at 9:56 AM, popcornmix [email protected] wrote:
It's the gpu that is calling vcos_abort, so arm stacktrace/coredump won't help. It's not possible to capture a gpu stacktrace/coredump.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/raspberrypi/userland/issues/254#issuecomment-221618103
I'm still trying to get a way to reproduce this reliably but in the mean time I did find where the loop/wait seems to occur, if this is helpful: #0 0x76d8ba40 in do_futex_wait (isem=isem@entry=0x76c29a40 <khrn_queue+76>) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48 #1 0x76d8baf4 in __new_sem_wait (sem=0x76c29a40 <khrn_queue+76>) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:69 #2 0x76b51aa4 in vchiu_queue_pop () from /opt/vc/lib/libvchiq_arm.so #3 0x76c02be8 in rpc_recv () from /opt/vc/lib/libEGL.so #4 0x76c132dc in vguLine () from /opt/vc/lib/libEGL.so #5 0x76da9920 in Line () from /usr/lib/libshapes.so #6 0x43b66666 in ?? ()
As I said, still working on getting something that you can run that'll reproduce this for you.
On Wed, May 25, 2016 at 11:40 AM, Mark M [email protected] wrote:
Ah, bummer. I'll work on creating a POC that reproduces the issue and will reply back here as soon as I get something created that I can use to reliably reproduce the bug.
On Wed, May 25, 2016 at 9:56 AM, popcornmix [email protected] wrote:
It's the gpu that is calling vcos_abort, so arm stacktrace/coredump won't help. It's not possible to capture a gpu stacktrace/coredump.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/raspberrypi/userland/issues/254#issuecomment-221618103
@ykram any progress on the POC?
@ykram any progress on the POC? (This is second ping...)
We saw a very similar (possibly the same) issue in our firmware. We could reproduce it using hello_triangle.bin. Start it, then call tvservice -p and restart hello_triangle.bin. After a few cycles hello_triangle.bin would not start again and the stack shows that it is hanging in eglContextCreate.
After a lot of digging around we realized that we had enable_hdmi_status=1 set in config.txt. After removing that the issue did not appear again. Do you possibly have that option set as well @DjPale?
@popcornmix Any thoughts about this?
@julianscheel I've just tried:
while : ; do (./hello_triangle.bin &); sleep 2; tvservice -p; sleep 2; killall hello_triangle.bin; done
with and without enable_hdmi_status=1 and it seems to running okay. Is that what you meant?
@popcornmix: Can you try again with this script?
#!/bin/sh
while : ; do
tvservice -p
./hello_triangle.bin &
PID=$!
tvservice -p
sleep 5
kill $PID
./hello_triangle.bin &
PID=$!
sleep 5
kill $PID
done
Starting tvservice immediately before hello_triangle seems to be necessary. With this script, I can reliably trigger the bug in a fully updated raspbian and with enable_hdmi_status=1. It usually takes about 10 iterations of the loop to actually happen.
The second invocation of hello_triangle exists just so that it is easier to check whether or not the bug triggered.
edit: It can also take many more iterations than just 10, but so far, the bug always triggers here eventually.
Any status on resolving this bug? I'm currently being affected by it, even in 2018 with Raspbian Stretch.
I doubt anyone is looking at it, unfortunately it's very low priority, and we have oodles of higher priority stuff to fix/develop.
@camthesaxman If you have a simple test case you can share that triggers the lockup, then we can investigate the issue.
This issue will be closed within 30 days unless further interactions are posted. If you wish this issue to remain open, please add a comment. A closed issue may be reopened if requested.