VC4 ISP stuck on upstream
Describe the bug
Hi @6by9 @naushir,
I am trying to upstream support for the VC04 based ISP driver. For this, I have destaged VCHIQ and MMAL drivers, which are already posted upstream.
Rest of the in-progress work is in this branch (based on top of v6.18-rc1): https://github.com/jailuthra/linux/commits/rpi4-isp
This branch includes vc-sm-cma, misc patches for firmware and char drivers, DMA support for BCM2711, and finally the ISP driver. The vc-sm-cma and bcm2835-isp drivers have been moved outside of staging as part of this branch, for posting upstream.
With this I am able to probe all relevant drivers, and the media graph is populated, but if I try to stream with unicam + ISP using libcamera the stream gets stuck without returning a single frame.
Unicam-only capture runs fine though. From the logs, looks like the ISP does not dequeue any queued buffers. Once the process gets stuck, the firmware is in a hung state and requires a system reboot to recover.
I am using the same devicetree, overlays and firmware as downstream (Bookworm, 6.12). So the only change is on the kernel image and driver modules side.
I have attached the debug logs from libcamera and from vchiq-mmal driver below, for both the working (downstream) and non-working (upstream) case.
My questions are:
- Am I missing some obvious dependency in my branch, that is required for MMAL and remote ISP driver to function?
- How can I debug this better? As the kernel side debug logs don't seem to give me details of what's going wrong on the firmware side.
Steps to reproduce the behaviour
- Clone https://github.com/jailuthra/linux
- Switch to
rpi4-ispbranch - Compile and install the kernel and modules (using bcm2711_defconfig) without DTBs (use downstream DTBs) to an Raspberry Pi 4B
- Attach a camera module and run libcamera
Step 4 might not work if you've recently updated libcamera, which returns error on broken media pipelines (with upstream unicam driver).
I am using v0.5.0, where the pipeline handler works fine, but the capture still initially fails due to broken pipe as the format has to be set manually using media-ctl -V on the unicam subdevice. Once that is done, it proceeds to capture raw frames from the sensor and submits them to the ISP.
This whole mechanism works fine on the downstream kernel as well (if forced to use the newer unicam driver), but gets stuck on the ISP dequeue with my upstream branch.
Device (s)
Raspberry Pi 4 Mod. B
System
Which OS and version?
Raspberry Pi reference 2024-07-04
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 0b115f302a8f1e5bd3523614d7f45b9d447434c7, stage4
Which firmware version?
Aug 20 2025 17:02:58
Copyright (c) 2012 Broadcom
version cd866525580337c0aee4b25880e1f5f9f674fb24 (clean) (release) (start_db)
Logs
libcamera-upstream.txt libcamera-downstream.txt vchiq-mmal-downstream.txt vchiq-mmal-upstream.txt
As you can see the BUFFER_TO_HOST message never arrives in the case of upstream branch, with everything else before that being the exact same.
Once the stream gets stuck, I see prints on dmesg:
[ +0.000003] bcm2835_mmal_vchiq: timed out waiting for sync completion
[ +0.000008] Firmware transaction 0x00030002 timeout
[ +0.000009] bcm2835-isp bcm2835-isp: bcm2835_isp_node_stop_streaming: Failed disabling i/p port, ret -62
And vclog gives me the same timeout, but nothing before that:
darkapex@rpios:~ $ sudo vclog -a
079043.085: assert( ip_isp_timeout fired ) failed; ../../../../../middleware/openmaxil/components/isp.c::ip_isp_timeout line 1894
Additional context
No response
@jailuthra Does reverting https://github.com/torvalds/linux/commit/919d6924ae9b4bcc9cb1d5ce4b78d5b92665d630 & https://github.com/torvalds/linux/commit/6526402b9bac873d7a64c6e81eb53307d8471f08 fix your issue?
Btw using the downstream DTB with the upstream kernel isn't a good idea. Please use only the upstream DTB for a upstream kernel.
@lategoodbye Yes it does! \o/
darkapex@rpios:~ $ cam -c1 -C5
[0:00:51.225395670] [894] INFO Camera camera_manager.cpp:326 libcamera v0.5.0+62-91bc52dc
[0:00:51.289109522] [895] WARN RPiSdn sdn.cpp:40 Using legacy SDN tuning - please consider moving SDN inside rpi.denoise
[0:00:51.297748096] [895] WARN RPI vc4.cpp:393 Mismatch between Unicam and CamHelper for embedded data usage!
[0:00:51.299705300] [895] INFO RPI vc4.cpp:447 Registered camera /base/soc/i2c0mux/i2c@1/ov5647@36 to Unicam device /dev/media0 and ISP device /dev/media1
Using camera /base/soc/i2c0mux/i2c@1/ov5647@36 as cam0
[0:00:51.301368596] [894] INFO Camera camera.cpp:1205 configuring streams: (0) 800x600-XRGB8888
[0:00:51.301901207] [895] INFO RPI vc4.cpp:622 Sensor: /base/soc/i2c0mux/i2c@1/ov5647@36 - Selected sensor format: 1296x972-SGBRG10_1X10 - Selected unicam format: 1296x972-pGAA
cam0: Capture 5 frames
51.762213 (0.00 fps) cam0-stream0 seq: 000008 bytesused: 1920000
51.822279 (16.65 fps) cam0-stream0 seq: 000009 bytesused: 1920000
51.882345 (16.65 fps) cam0-stream0 seq: 000010 bytesused: 1920000
51.942411 (16.65 fps) cam0-stream0 seq: 000011 bytesused: 1920000
52.002478 (16.65 fps) cam0-stream0 seq: 000012 bytesused: 1920000
darkapex@rpios:~ $ uname -r
6.18.0-rc1-v8+
Btw using the downstream DTB with the upstream kernel isn't a good idea. Please use only the upstream DTB for a upstream kernel.
Will do. I was trying to reduce the variables, will move the relevant parts to upstream DT now that I have a working baseline. Thanks a lot for the fix.
Will also report this bug on lkml patches for those two commits.
We can also notify @mairacanal directly.
Ack. Btw just reverting https://github.com/torvalds/linux/commit/919d6924ae9b4bcc9cb1d5ce4b78d5b92665d630 was enough.
Thanks - that's helpful.
This was just guessing, because we already got a regression report recently.
@jailuthra Could you please test this patch on top of Linux 6.18 instead of reverting those other patches?
Firmware clocks are going to be slower to use than the direct clock driver, so be on the lookout for performance regressions.
@jailuthra Could you please test this patch on top of Linux 6.18 instead of reverting those other patches?
@lategoodbye Sure, I'm still trying to figure out the changes in upstream DTS to get camera, DMA and UART0 to work.
But if it helps, I tried using my (upstream) kernel image with the downstream DTB with this patch applied (and none of the other reverts), and the ISP was stuck same as before.
@jailuthra Could you please test this patch on top of Linux 6.18 instead of reverting those other patches?
Checked with upstream DTB as well (have updated my branch with the changes to probe OV5647).
This patch alone does not fix the issue.
Could you please dump the clock tree with upstream kernel / DTB + mentioned patch?
cat /sys/kernel/debug/clk/clk_summary
Here is the clock tree with upstream kernel, dtb and the mentioned patch: clocktree.txt
And, if it helps, here is the clock tree when I also revert https://github.com/torvalds/linux/commit/919d6924ae9b4bcc9cb1d5ce4b78d5b92665d630 (to get the ISP working) clocktree-with-revert.txt
First thing is that fw-clk-isp (fw driver) is at 250 MHz, while isp (ARM driver) is at 500 MHz in the dumps. This is unexpected.
Also there is no clock consumer for both of them in the dump. I would expect the camera driver consume/use at least one clock.
Looking at your overlay shows a fixed clock.
Shouldn't the camera use the isp clock?
@lategoodbye the camera sensor->csi-rx is a separate pipeline, and if I understand correctly the sensor clock is supplied through an external crystal (which is part of the camera module), and not through the RPi4 SoC/board. Hence we model it as a fixed clock in DT.
The ISP pipeline is purely memory to memory, so it operates on pointers to the raw frame buffers captured from camera.
But I'm not aware of how the ISP and firmware clocking looks like on the SoC side, so I don't know if there should be any consumer for the ISP clock.
fw-clk-isp (fw driver) is at 250 MHz, while isp (ARM driver) is at 500 MHz in the dumps.
This is the case even when I revert 919d6924ae9b4 where the ISP works fine. I wonder if marking the ISP clock as critical is a good enough fix for now? As doing that also solves the issue without reverting the whole commit.
Thanks for the explanation.
In case there is a clock defined in Linux there should be a consumer otherwise the clock framework disables this unused clock after boot. Making the clock critical avoids this issue. You can give it a try to proceed.
Makes sense. And yes, trying that in https://github.com/jailuthra/linux/commit/078854c37cbc892564be8bf6bc5ce48249a799de fixes the issue without needing reverts.
I'll post that to linux-clk@ for review.
@jailuthra The mentioned patch https://github.com/torvalds/linux/commit/4adc20ba95d472a919f54d441663924e33c92279 is currently in mainline. Does your change still works on top of Linux 6.18-rc3 ?
@lategoodbye Just rebased on top of v6.18-rc3 and it still works fine.
Sorry, I've been out of the office for a few days.
Exposing of the ISP clock was probably an error as it is driven by the firmware, but appears to date back to 2012 when life was very different. Commit 410cf8252e5 was then a clean up that exposed it via the Linux driver.
The firmware should be internally requesting the clock and power domain when processing frames, so it's not totally clear as to why that doesn't avoid issues. I'll have a discussion with @naushir about it.
@6by9 no problem. I had posted a patch to mark the ISP clock as critical: https://lore.kernel.org/linux-clk/[email protected]/
But now I think it would be cleaner to drop that clock altogether from linux. Or if that is not possible, then at least have bcm2835-isp as a linux user for it even if it's the firmware that drives it. What do you and @naushir think?
@jailuthra What do mean by "drop that clock"? The Linux kernel must always be backward compatible to old DTB (ABI).
@lategoodbye sorry I meant that rather than marking it as critical, we altogether drop the registration of clk_id = RPI_FIRMWARE_ISP_CLK_ID (= 7), by reverting https://github.com/raspberrypi/linux/commit/410cf8252e5c595dd7d20cb176359842c35bc4cb.
Does that count as breaking ABI/DT? Unless I missed some old downstream kernels, I don't see any device tree users of "&firmware_clocks 7" in rpi-6.12 and mainline, as we don't use it in the linux ISP driver (nor is the ISP separately modeled in the device tree, only vchiq mailbox is).
I don't see any device tree users of "&firmware_clocks 7" in rpi-6.12 and mainline
Argh, I now checked arm64/dts and the bcm2712-ds.dtsi uses it for PISP BE :(
I had the BCM2835_CLOCK_ISP in the upstream DTS / clk-bcm2835 in mind, which could make things more complicated.
I've discussed with Naush.
For Pi0-4 all users of the ISP clock are controlled by the firmware. The ISP is also used by both video encode and video decode for "cheap" format conversion. We are never going to get permission from Broadcom to release an open source driver for that ISP block, so there will never be a kernel-side user of it..
For Pi5 the only user is the PISP Backend, and the firmware will never use it.
Removing the clock from the enumeration is going to cause problems, but I will look at disconnecting it within the firmware for Pi0-4 as the kernel should never change it. I'll also review whether there are any other clocks that are in the same situation.