firmware
firmware copied to clipboard
Timeout while executing code on QPUs via mailbox interface
Describe the bug Calls to execute code on the QPUs via the mailbox interface time out randomly.
To reproduce
- Download the example here: rpi_qpu_timeout.c
-
gcc rpi_qpu_timeout.c
-
i=0; while true; do ((i+=1)); echo $i; sudo ./a.out; if [ $? != 0 ]; then break; fi; done
- Observe the program exit with message
Execute: Connection timed out
System
System Information
------------------
Raspberry Pi Zero W Rev 1.1
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
Raspberry Pi reference 2019-09-26
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 80d486687ea77d31fc3fc13cf3a2f8b464e129be, stage2
Linux raspberrypi 5.10.63+ #1457 Tue Sep 28 11:24:51 BST 2021 armv6l GNU/Linux
Revision : 9000c1
Serial : 000000007132c8dc
Model : Raspberry Pi Zero W Rev 1.1
Throttled flag : throttled=0x0
Camera : supported=1 detected=1
Videocore information
---------------------
Sep 28 2021 11:33:44
Copyright (c) 2012 Broadcom
version 778b6a4f3c7d8d48bb63c02c47bcfbac79417bea (clean) (release) (start_x)
I narrowed the problem down. The timeouts happen when I use uncached memory. The mailbox command to allocate memory has a flags parameter, described here. When I use the flag MEM_FLAG_DIRECT
, timeouts happen. When I use MEM_FLAG_COHERENT
or MEM_FLAG_L1_NONALLOCATING
, they don't. Does this mean the QPUs are reading from the L2 cache when they shouldn't? Or is something else going on?
When you mix cached and uncached accesses you have to be very careful to ensure that there are no dirty cache lines outstanding. Having multiple caches only adds to the dangers. I suspect what is happening is that the mmap
is creating an ARM cacheable mapping, but the ioctl
s are not doing the necessary flushing before passing the job to the VPU to execute. The VPU can flush its data cache as much as it likes, but that won't affect the ARM caches.
To make this reliable I think the vcio driver would need to be changed to flush the ARMs caches before launching the QPU. Does that sound plausible, @popcornmix? (I've never written any QPU code...)
I believe the O_SYNC
in open("/dev/mem", O_RDWR | O_SYNC)
means the mmap will be uncached by arm, so no flushing is required.
So what do you think the problem might be?
No immediate idea.
I should point out that we are moving toward the kms driver being default (so 3d driver running on the arm) which is incompatible with this API (driving 3d hardware from the firmware). You are free to stick with the fkms driver (where this API is available) for the foreseeable future.
The alternative api that works with kms uses an ioctl to get the kernel to schedule a qpu/shader job. The best example of its use in contained in this issue.
It should be noted that the example is for VC6, and the upstream VC4 KMS driver does not support QPU execution due to the lack of memory protection. See doe300/VC4CL#51.
Closing because only KMS is support these days