firmware Timeout while executing code on QPUs via mailbox interface

Describe the bug Calls to execute code on the QPUs via the mailbox interface time out randomly.

To reproduce

Download the example here: rpi_qpu_timeout.c
gcc rpi_qpu_timeout.c
i=0; while true; do ((i+=1)); echo $i; sudo ./a.out; if [ $? != 0 ]; then break; fi; done
Observe the program exit with message Execute: Connection timed out

System

System Information
------------------

Raspberry Pi Zero W Rev 1.1
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"

Raspberry Pi reference 2019-09-26
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 80d486687ea77d31fc3fc13cf3a2f8b464e129be, stage2

Linux raspberrypi 5.10.63+ #1457 Tue Sep 28 11:24:51 BST 2021 armv6l GNU/Linux
Revision	: 9000c1
Serial		: 000000007132c8dc
Model		: Raspberry Pi Zero W Rev 1.1
Throttled flag  : throttled=0x0
Camera          : supported=1 detected=1

Videocore information
---------------------

Sep 28 2021 11:33:44 
Copyright (c) 2012 Broadcom
version 778b6a4f3c7d8d48bb63c02c47bcfbac79417bea (clean) (release) (start_x)

Oct 06 '21 04:10 swrenn

I narrowed the problem down. The timeouts happen when I use uncached memory. The mailbox command to allocate memory has a flags parameter, described here. When I use the flag MEM_FLAG_DIRECT, timeouts happen. When I use MEM_FLAG_COHERENT or MEM_FLAG_L1_NONALLOCATING, they don't. Does this mean the QPUs are reading from the L2 cache when they shouldn't? Or is something else going on?

Oct 06 '21 21:10 swrenn

When you mix cached and uncached accesses you have to be very careful to ensure that there are no dirty cache lines outstanding. Having multiple caches only adds to the dangers. I suspect what is happening is that the mmap is creating an ARM cacheable mapping, but the ioctls are not doing the necessary flushing before passing the job to the VPU to execute. The VPU can flush its data cache as much as it likes, but that won't affect the ARM caches.

To make this reliable I think the vcio driver would need to be changed to flush the ARMs caches before launching the QPU. Does that sound plausible, @popcornmix? (I've never written any QPU code...)

Oct 07 '21 08:10 pelwell

I believe the O_SYNC in open("/dev/mem", O_RDWR | O_SYNC) means the mmap will be uncached by arm, so no flushing is required.

Oct 07 '21 15:10 popcornmix

So what do you think the problem might be?

Oct 07 '21 15:10 pelwell

No immediate idea.

I should point out that we are moving toward the kms driver being default (so 3d driver running on the arm) which is incompatible with this API (driving 3d hardware from the firmware). You are free to stick with the fkms driver (where this API is available) for the foreseeable future.

The alternative api that works with kms uses an ioctl to get the kernel to schedule a qpu/shader job. The best example of its use in contained in this issue.

Oct 08 '21 16:10 popcornmix

It should be noted that the example is for VC6, and the upstream VC4 KMS driver does not support QPU execution due to the lack of memory protection. See doe300/VC4CL#51.

Oct 09 '21 02:10 Terminus-IMRC

Closing because only KMS is support these days

Jan 06 '24 12:01 timg236

firmware firmware copied to clipboard

Timeout while executing code on QPUs via mailbox interface

firmware
firmware copied to clipboard