linux
linux copied to clipboard
VC4 Memory Safety Improvements
Hi,
Here's a fairly big series that address a lot of memory safety issues in the vc4 driver that were mostly exposed when unbinding the driver.
The main culprit is that the component drivers usually allocate their main structure with a devm_kzalloc() variant that is free'd when the device is removed. However, when the device is actually removed the DRM device doesn't go away instantly but rather when the last user closes its fd. It means that the framework will still have pointers to the structures that have been allocated by the driver but were freed, so dangling pointers leading to use-after-free errors.
This is addressed by reworking the structures layout and then allocating with a drmm_ call instead that will free it when the DRM device is unregistered.
This was tested by doing (on a Pi4): echo "fe207000.pixelvalve" > /sys/bus/platform/drivers/vc4_crtc/unbind echo "fe207000.pixelvalve" > /sys/bus/platform/drivers/vc4_crtc/bind
We used to have various crashes and memory corruptions before, and it's now stable, with KASAN not reporting any memory-related issue.
With a quick check it compiled and booted okay. The unbind/bin pair worked as expected. Hotplug/unplug also okay. I'll run with it and keep an eye on anything unexpected.
fkms works, but triggers a WARN: http://sprunge.us/4beqb2
Not surprising as we have no vc4->hvs in this mode.
There's numerous ways we could hide this (downstream). e.g.
--- a/drivers/gpu/drm/vc4/vc4_hvs.c
+++ b/drivers/gpu/drm/vc4/vc4_hvs.c
@@ -969,7 +969,7 @@ int vc4_hvs_debugfs_init(struct drm_minor *minor)
int ret;
if (!vc4->hvs)
- return -ENODEV;
+ return 0;
if (!vc4->is_vc5)
Unbind on Pi3+ caused a panic:
echo "3f206000.pixelvalve" | sudo tee /sys/bus/platform/drivers/vc4_crtc/unbind
http://sprunge.us/4U0yUo should that work, or is that a TODO?
Unbind on Pi3+ caused a panic:
echo "3f206000.pixelvalve" | sudo tee /sys/bus/platform/drivers/vc4_crtc/unbindhttp://sprunge.us/4U0yUo should that work, or is that a TODO?
It looks like a WARN, not a panic though?
From the stack trace, it seems to be that WARN that triggers: https://github.com/mripard/rpi-linux/blob/rpi/5.15-hotplug-rework/drivers/gpu/drm/vc4/vc4_bo.c#L728
And from the comment, it probably occurs when the userspace still holds that buffer and accesses it while the device unregisters.
Does it happen every time?
Either way, the Pi3 will break if we unbind and rebind at the moment because the clock is disfunctional, so it might not be a big deal.
fkms works, but triggers a WARN: http://sprunge.us/4beqb2
Not surprising as we have no
vc4->hvsin this mode. There's numerous ways we could hide this (downstream). e.g.--- a/drivers/gpu/drm/vc4/vc4_hvs.c +++ b/drivers/gpu/drm/vc4/vc4_hvs.c @@ -969,7 +969,7 @@ int vc4_hvs_debugfs_init(struct drm_minor *minor) int ret; if (!vc4->hvs) - return -ENODEV; + return 0; if (!vc4->is_vc5)
I've added a patch that fixes it. I've not squashed it into the original patch to ease the rebase on later versions
Either way, the Pi3 will break if we unbind and rebind at the moment because the clock is disfunctional, so it might not be a big deal.
Could you elaborate on the clock issue? I'm not immediately recalling which issue this refers to.
Does it happen every time?
Yes. (more than ten times in a row). Sometimes, some seconds after the ubind we lock up hard (I have time to check for WARN in dmesg). Sometimes this doesn't occur and I can actually bind again and display reappears.
That was with boot to desktop (and glamor) enabled. With boot to console, there is no WARN. But the intermittent hangs still occur.
@mripard current version of this PR produces no output on Pi4. No clear errors in dmesg But:
$ kmsprint
terminate called after throwing an instance of 'std::runtime_error'
what(): No modesetting DRM card found
Aborted
The initial version of PR was working for me on pi4.
I'm not sure what's going on, the only difference between the two is that commit https://github.com/raspberrypi/linux/pull/5087/commits/4496c8340dfc1d9186a783b8a41523e8c02c51e9
Nothing has changed in your setup (configuration, etc) ? It looks like the DRM device never probes in the first place
Switching to rpi-5.15.y works as expected (with no other changes to config.txt etc).
I've bisected the tree and "drm/vc4: hdmi: Switch to device-managed ALSA initialization" is the commit that breaks it for me.
You might assume the reason for the breakage is the newly added:
ret = devm_add_action_or_reset(dev, vc4_hdmi_audio_codec_release, vc4_hdmi);
returning non-zero and so causing the bind to fail, but it does return 0 (on the 4 calls).
If I just comment out that line, I get the display back.
I'm a bit confused, and I don't get that here. Could you share your kernel configuration and config.txt?
I'm using your kernel branch directly. The config is just bcm2711_defconfig. config.txt
disable_fw_kms_setup=1
os_prefix=test/
enable_uart=1
dtoverlay=disable-bt
camera_auto_detect=1
disable_overscan=1
max_framebuffers=2
dtoverlay=vc4-kms-v3d,cma-512
dtparam=audio=on
force_turbo=1
hdmi_enable_4kp60=1
cmdline.txt
console=tty1 console=ttyS1,115200 nfsroot=10.3.14.138:/home/dom/bullseye,vers=3 rw ip=dhcp rootwait log_buf_len=8M
I did try a libreelec build with this kernel tree. That did boot and give a picture, but it wasn't stable. After playing a video (so a couple of hdmi mode sets) it got very sluggish. ssh in was slow. top didn't report high user cpu, but the "wa" value was very high (48%). Three different reboots all ended up with the sluggish behaviour. Switching back to 5.15 kernel was fine.
I've started to debug this further, and it's not related to anything in config.txt but instead seems linked to the kernel config. My configuration is working fine, but bcm2711_defconfig doesn't.
vc4 indeed doesn't probe, but it's not clear why at this point, I'll look into it.
I've found the original source of the error, it was due to the hdmi-codec module having a fairly short window to be loaded before we were failing and we were never meeting it. I'm not sure why it started to be an issue with this PR, but well....
However, while rebasing on the current 5.15 branch I encountered a regression, possibly due to a bad conflict resolution. I'm looking for the solution at the moment.
It was a poor conflict resolution indeed. I've pushed a new version that works and without conflicts (hopefully)
This looks better. Not seeing the vc4 probe issue. I can unbind and rebind on vc4.
Usual tests like kodi are working on pi4 and pi3.
I was testing your tree directly (i.e. 5.15.61). There is a new conflict with latest (5.15.65), so needs a rebase.
I just pushed a new branch based on the current rpi-5.15.y branch
github is showing "This branch cannot be rebased due to too many changes" although it looks to be rebased on rpi-5.15.y to me. I'm sure we can merge it offline though.
@6by9 any comments?
github is showing "This branch cannot be rebased due to too many changes" although it looks to be rebased on rpi-5.15.y to me. I'm sure we can merge it offline though.
Does it? It's weird, I have the message that it doesn't have any conflict here.
I see the same as @popcornmix - it's a straightforward (if enormous) fast-forward merge on the command line, but the GitHub GUI is refusing to touch it.
@6by9 any comments?
These look to be largely backports from drm-next, so should all be good. I'm not seeing anything nasty from a quick scan of the changes.
Merged offline (which github seems to have treated as a merge).
@mripard does upstream/rpi-6.0.y want some of these commits?
Just checked and while unbind seems clean on rpi-6.0.y, binding again results in:
/sys/bus/platform/drivers/vc4_crtc/bind: Device or resource busy
and in dmesg
[ 49.244491] vc4_crtc: probe of fe207000.pixelvalve failed with error -16
Most of them also apply to 6.0 indeed. The patches are queued for 6.1
Since the series is fairly large, I can rebase it and send it to you if you'd prefer?
Let me try the usual forward-port method, and I'll let you know if it I hit a snag.
That felt like death by a thousand cuts - so many small merge conflicts - but it's done now.