linux
linux copied to clipboard
drm/vc4: Extend BROADCOM_SAND modifier to handle computing automatically
The vc4 HVS can handle almost any column height in the SAND format, and this was signalled via the vendor bits in DRM_FORMAT_MOD_BROADCOM_SAND128.
DRM is now requiring that all potential modifier values are reported, not just the canonical version of them. It is not feasible to report all the potential column heights, therefore add an option to go for a lowest common denominator calculation that computes the column height as height * 3/2 as it is YUV420.
I can't speak to whether this patch is the best solution to the problem, but I can believe that it might be.
I can't speak to whether this patch is the best solution to the problem, but I can believe that it might be.
I'm waiting for @jc-kynesim to check that we can get this working, and then will send it upstream to see what their response is.
The failure mode isn't critical at present, and ideally I want to avoid having to revert a downstream approach if mainline object, hence leaving it as a draft for now.
We'll want to check that 4kp60 hevc decode doesn't degrade with this change, as it was already somewhat borderline on Pi4. My understanding is we previously would choose a larger height to get better sdram bank switching behaviour between adjacent stripes of image.
I don't know how beneficial this actually was, but it's possible it was providing some necessary performance benefit.
Pretty sure that the current code simply rounds height up to next x16 and then has col height as 3/2*height.
We'll want to check that 4kp60 hevc decode doesn't degrade with this change, as it was already somewhat borderline on Pi4. My understanding is we previously would choose a larger height to get better sdram bank switching behaviour between adjacent stripes of image.
I don't know how beneficial this actually was, but it's possible it was providing some necessary performance benefit.
I consulted with jc on this and what was expected.
The V4L2 driver does round the column height up to a multiple of 16 and *3/2, however as long as it sets the buffer height to be the padded height, then DRM should auto-compute the correct number. This is the reason I'm slightly concerned over looking at src_h, not buffer h.
Okay - there is a more complicated algorithm on the firmware side for calculating an optimal height.
The requirement in the source that the stride be an odd multiple of two pages in size is actually over-restrictive. The intention is that in a four-bank SDRAM if a motion compensation source location overlaps a stripe boundary in X and a page transition in Y then the four pages covered are in different banks (call this the "four corners property"). As we have an eight-bank device we get the desired effect if the stride is a multiple of the page size and stride/pagesize & 7 != 0, 1 or 7. On Pi 2 our wacky custom SDRAM gives us 8k (64-line) pages, while on Pi 1 we have 4k (32-line) pages. I think our best bet to satisfy the "four corners" property and avoid aliasing is to pad stripes to an odd multiple of 2k (an odd multiple of 16 lines), and to avoid multiples which bring the banks in adjacent stripes so nearly into alignment that a 23 pixel high motcomp neighbourhood can reach from a bank n page in stripe x to a bank n page in stripe x+1.
(This is used for both the 8-bit and 10-bit column formats).
But I guess if we're not doing this currently then it doesn't affect this PR. It is perhaps something we should investigate as it could help with 4kp60 decode performance.
V4L2 stateful (H264/MPEG4/MPEG2/H263/VC-1) video decode doesn't produce SAND - that was only available via MMAL that officially we have now deprecated.
The only supported generator of SAND is the HEVC decoder, which (AFAIK) has no specific requirements.
The DRM driver will still accept the old SAND modifier with an explicit column height, however it is not advertising all the relevant modifier values (which is the new requirement). If we need to introduce it later, then supporting the relevant 4kB page aligned values is still plausible as it will be some subset of the 4kB (32 line) alignments up to max 256kB (2048 lines), which is a max 64 entries and likely far less.
Looking at what I hope is a vaguely current FFmpeg from JC, DRM gets passed the cropped width/height as the buffer size, which is going to cause grief as DRM then hasn't got the padded height.
Seeing as it also sets the source rectangle there appears to be little reason to set the buffer geometry to the cropped value, and DRM can then use fb->height.
Whilst the height isn't passed directly it is passed implicitly in the plane[1] offset (offset/128)
Though I'll agree I probably shouldn't have cropped w/h in the AddFB2
Following discussion with jc
- fixed as
fourcc_mod_broadcom_param
only extracts an int. - aligned height to a multiple of 2 lines
With the matching drm_fourcc.h change imported, https://github.com/jc-kynesim/rpi-ffmpeg.git branch dev/5.1.4/bcm_auto_mod_1 is working with the brave new world.
Thanks. I'd like to note my mild discomfort with how height now needs to be the original allocation height when previously we could get away with (and much code did) applying cropping directly to the stored frame height. The alternative would be to use the chroma offset to calculate the luma height (which would be reliable) but I can see how this might well play badly with upstream so I guess this is the best we can do.
Unless there are serious objections, then I was going to send this upstream and get their response before looking at merging it here. I'll look at sending that on Wednesday, so please shout soon.
Reworked so that V4L2 uses the magic page aligned heights, and DRM advertises those heights as modifiers.
Seems to work without changes to FFmpeg.
I suggest you remove the constrain2x lines - they no longer do something useful. Alternatively leave then as they were previously and only do the new calculation if the user passed in 0. The former is probably better.