linux icon indicating copy to clipboard operation
linux copied to clipboard

HVS load higher than 240MHz in vc4 driver

Open carlonluca opened this issue 3 years ago • 13 comments

Describe the bug

This may not be a bug at all, sorry for reporting it as a bug, but this is a doubt closely related to a kernel line. Feel free to close if you think I'm in the wrong place.

The problem I'm experiencing is described also here: https://forums.raspberrypi.com/viewtopic.php?t=338157. I'm trying to use the DRM API to render a UI on a plane and framebuffers resulting from the decode of a 1080p@30 video on another plane below. The UI is drawn using Qt, which also sets the mode. Then, I decode a video and every frame is imported through drmPrimeFDToHandle() and a framebuffer is created with drmModeAddFB2(). The problem is with the drmModeSetPlane() function that follows, which works up to a certain dest size and then starts only returning ENOSPC. To write the code I read this: https://github.com/6by9/ffmpeg-drm/blob/master/main.c.

By looking into the vc4 driver, I see this is where the error comes from: https://github.com/raspberrypi/linux/blob/rpi-5.15.y/drivers/gpu/drm/vc4/vc4_kms.c#L644. I placed a log in there, and the value of hvs_load is 248893440 when two 1080p planes are used. 720p works fine. 1080p works only up to a certain dest size set in drmModeSetPlane(). If I remove that check it seems 1080p works well, but I guess this is considered an unacceptable load.

Is this expected? I was used to be able to render UI and a 1080p video just fine on a rpi3 with the old graphics stack based on dispmanx and oxmplayer (even fkms). It is very well possible I'm doing it wrong, but maybe someone can confirm if this is supposed to work or not. Thanks.

Steps to reproduce the behaviour

Render UI through OpenGL on a plane and repeatedly create framebuffers and set on another plane at 30fps.

Device (s)

Raspberry Pi 3 Mod. B

System

Raspberry Pi reference 2022-04-04 Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 226b479f8d32919c9fe36dd5b4c20c02682f8180, stage2

Mar 24 2022 13:20:54 Copyright (c) 2012 Broadcom version e5a963efa66a1974127860b42e913d2374139ff5 (clean) (release) (start)

Linux raspberrypi 5.15.56-v7+ #1 SMP Fri Jul 29 19:29:51 BST 2022 armv7l GNU/Linux (this kernel is self-built to investigate the issue, but the behavior was identical with the original kernel)

Logs

I collected this: https://pastebin.com/kWYYLqke. The errors I receive in userspace are probably these: [ 1594.449130] [drm:drm_atomic_check_only [drm]] atomic driver check for 77bab436 failed: -28

Additional context

No response

carlonluca avatar Aug 09 '22 08:08 carlonluca

Are both planes 32-bit RGBA? Are either planes resized (which is a more expensive operation)? one 1920x1080 32-bit RGBA plus one 1920x1080 YUV plane does work on a Pi3 with kms (e.g. using kodi).

Output of kmsprint is useful for seeing exactly what is on display (modetest would work as an alternative, but it much less readable).

popcornmix avatar Aug 09 '22 10:08 popcornmix

Thank you @popcornmix for your input. This is the output of kmsprint while my app is running:

Connector 0 (32) HDMI-A-1 (connected)
    EDID (1) = blob-id 290 len 256 (immutable)
    DPMS (2) = 0 (On) [On=0|Standby=1|Suspend=2|Off=3]
    TILE (4) = blob-id 0 (immutable)
    link-status (5) = 0 (Good) [Good=0|Bad=1]
    non-desktop (6) = 0 [0 - 1] (immutable)
    CRTC_ID (20) = object id 89
    left margin (33) = 0 [0 - 100]
    right margin (34) = 0 [0 - 100]
    top margin (35) = 0 [0 - 100]
    bottom margin (36) = 0 [0 - 100]
    Colorspace (37) = 0 (Default) [Default=0|SMPTE_170M_YCC=1|BT709_YCC=2|XVYCC_601=3|XVYCC_709=4|SYCC_601=5|opYCC_601=6|opRGB=7|BT2020_CYCC=8|BT2020_RGB=9|BT2020_YCC=10|DCI-P3_RGB_D65=11|DCI-P3_RGB_Theater=12]
    max bpc (38) = 8 [8 - 12]
    Broadcast RGB (39) = 0 (Automatic) [Automatic=0|Full=1|Limited 16:235=2]
  Encoder 0 (31) TMDS
    Crtc 3 (89) [email protected] 148.500 1920/88/44/148/+ 1080/4/5/36/+ 60 (60.00) 0x5 0x60
        OUT_FENCE_PTR (19) = 0 [0 - 18446744073709551615]
        ACTIVE (22) = 1 [0 - 1]
        MODE_ID (23) = blob-id 293 len 68
        VRR_ENABLED (24) = 0 [0 - 1]
        CTM (27) = blob-id 0
        GAMMA_LUT (28) = blob-id 0
        GAMMA_LUT_SIZE (29) = 256 [0 - 4294967295] (immutable)
      Plane 3 (79) fb-id: 291 (crtcs: 3) 0,0 1920x1080 -> 0,0 1920x1080 (XR24 AR24 AB24 XB24 RG16 BG16 AR15 XR15 RG24 BG24 YU16 YV16 YU12 YV12 NV12 NV21 NV16 NV61 RGB8 BGR8 XR12 AR12 XB12 AB12 BX12 BA12 RX12 RA12)
          type (8) = 1 (Primary) [Overlay=0|Primary=1|Cursor=2] (immutable)
          SRC_X (9) = 0 [0 - 4294967295]
          SRC_Y (10) = 0 [0 - 4294967295]
          SRC_W (11) = 125829120 [0 - 4294967295]
          SRC_H (12) = 70778880 [0 - 4294967295]
          CRTC_X (13) = 0 [-2147483648 - 2147483647]
          CRTC_Y (14) = 0 [-2147483648 - 2147483647]
          CRTC_W (15) = 1920 [0 - 2147483647]
          CRTC_H (16) = 1080 [0 - 2147483647]
          FB_ID (17) = object id 291
          IN_FENCE_FD (18) = -1 [-1 - 2147483647]
          CRTC_ID (20) = object id 89
          IN_FORMATS (30) = blob-id 80 len 256 (immutable)
          alpha (81) = 65535 [0 - 65535]
          pixel blend mode (82) = 0 (Pre-multiplied) [Pre-multiplied=0|Coverage=1|None=2]
          rotation (83) = 0x1 (rotate-0) [rotate-0=0x1|rotate-180=0x4|reflect-x=0x10|reflect-y=0x20]
          COLOR_ENCODING (84) = 1 (ITU-R BT.709 YCbCr) [ITU-R BT.601 YCbCr=0|ITU-R BT.709 YCbCr=1|ITU-R BT.2020 YCbCr=2]
          COLOR_RANGE (85) = 0 (YCbCr limited range) [YCbCr limited range=0|YCbCr full range=1]
          CHROMA_SITING_H (86) = 0 [0 - 65536]
          CHROMA_SITING_V (87) = 0 [0 - 65536]
          zpos (88) = 0 [0 - 0] (immutable)
        FB 291 1920x1080
      Plane 8 (130) fb-id: 298 (crtcs: 0 1 2 3) 0,0 1920x1080 -> 0,0 1920x1080 (XR24 AR24 AB24 XB24 RG16 BG16 AR15 XR15 RG24 BG24 YU16 YV16 YU12 YV12 NV12 NV21 NV16 NV61 RGB8 BGR8 XR12 AR12 XB12 AB12 BX12 BA12 RX12 RA12)
          type (8) = 0 (Overlay) [Overlay=0|Primary=1|Cursor=2] (immutable)
          SRC_X (9) = 0 [0 - 4294967295]
          SRC_Y (10) = 0 [0 - 4294967295]
          SRC_W (11) = 125829120 [0 - 4294967295]
          SRC_H (12) = 70778880 [0 - 4294967295]
          CRTC_X (13) = 0 [-2147483648 - 2147483647]
          CRTC_Y (14) = 0 [-2147483648 - 2147483647]
          CRTC_W (15) = 1920 [0 - 2147483647]
          CRTC_H (16) = 1080 [0 - 2147483647]
          FB_ID (17) = object id 298
          IN_FENCE_FD (18) = -1 [-1 - 2147483647]
          CRTC_ID (20) = object id 89
          IN_FORMATS (30) = blob-id 131 len 256 (immutable)
          alpha (132) = 65535 [0 - 65535]
          pixel blend mode (133) = 0 (Pre-multiplied) [Pre-multiplied=0|Coverage=1|None=2]
          rotation (134) = 0x1 (rotate-0) [rotate-0=0x1|rotate-180=0x4|reflect-x=0x10|reflect-y=0x20]
          COLOR_ENCODING (135) = 1 (ITU-R BT.709 YCbCr) [ITU-R BT.601 YCbCr=0|ITU-R BT.709 YCbCr=1|ITU-R BT.2020 YCbCr=2]
          COLOR_RANGE (136) = 0 (YCbCr limited range) [YCbCr limited range=0|YCbCr full range=1]
          CHROMA_SITING_H (137) = 0 [0 - 65536]
          CHROMA_SITING_V (138) = 0 [0 - 65536]
          zpos (139) = 5 [1 - 17]
        FB 298 0x0
      Plane 9 (140) fb-id: 296 (crtcs: 0 1 2 3) 0,0 1920x1080 -> 0,0 1920x1080 (XR24 AR24 AB24 XB24 RG16 BG16 AR15 XR15 RG24 BG24 YU16 YV16 YU12 YV12 NV12 NV21 NV16 NV61 RGB8 BGR8 XR12 AR12 XB12 AB12 BX12 BA12 RX12 RA12)
          type (8) = 0 (Overlay) [Overlay=0|Primary=1|Cursor=2] (immutable)
          SRC_X (9) = 0 [0 - 4294967295]
          SRC_Y (10) = 0 [0 - 4294967295]
          SRC_W (11) = 125829120 [0 - 4294967295]
          SRC_H (12) = 70778880 [0 - 4294967295]
          CRTC_X (13) = 0 [-2147483648 - 2147483647]
          CRTC_Y (14) = 0 [-2147483648 - 2147483647]
          CRTC_W (15) = 1920 [0 - 2147483647]
          CRTC_H (16) = 1080 [0 - 2147483647]
          FB_ID (17) = object id 296
          IN_FENCE_FD (18) = -1 [-1 - 2147483647]
          CRTC_ID (20) = object id 89
          IN_FORMATS (30) = blob-id 141 len 256 (immutable)
          alpha (142) = 65535 [0 - 65535]
          pixel blend mode (143) = 0 (Pre-multiplied) [Pre-multiplied=0|Coverage=1|None=2]
          rotation (144) = 0x1 (rotate-0) [rotate-0=0x1|rotate-180=0x4|reflect-x=0x10|reflect-y=0x20]
          COLOR_ENCODING (145) = 1 (ITU-R BT.709 YCbCr) [ITU-R BT.601 YCbCr=0|ITU-R BT.709 YCbCr=1|ITU-R BT.2020 YCbCr=2]
          COLOR_RANGE (146) = 0 (YCbCr limited range) [YCbCr limited range=0|YCbCr full range=1]
          CHROMA_SITING_H (147) = 0 [0 - 65536]
          CHROMA_SITING_V (148) = 0 [0 - 65536]
          zpos (149) = 6 [1 - 17]
        FB 296 1920x1080
      Plane 23 (280) fb-id: 294 (crtcs: 3) 0,0 64x64 -> 0,0 64x64 (XR24 AR24 AB24 XB24 RG16 BG16 AR15 XR15 RG24 BG24 YU16 YV16 YU12 YV12 NV12 NV21 NV16 NV61 RGB8 BGR8 XR12 AR12 XB12 AB12 BX12 BA12 RX12 RA12)
          type (8) = 2 (Cursor) [Overlay=0|Primary=1|Cursor=2] (immutable)
          SRC_X (9) = 0 [0 - 4294967295]
          SRC_Y (10) = 0 [0 - 4294967295]
          SRC_W (11) = 4194304 [0 - 4294967295]
          SRC_H (12) = 4194304 [0 - 4294967295]
          CRTC_X (13) = 0 [-2147483648 - 2147483647]
          CRTC_Y (14) = 0 [-2147483648 - 2147483647]
          CRTC_W (15) = 64 [0 - 2147483647]
          CRTC_H (16) = 64 [0 - 2147483647]
          FB_ID (17) = object id 294
          IN_FENCE_FD (18) = -1 [-1 - 2147483647]
          CRTC_ID (20) = object id 89
          IN_FORMATS (30) = blob-id 281 len 256 (immutable)
          alpha (282) = 65535 [0 - 65535]
          pixel blend mode (283) = 0 (Pre-multiplied) [Pre-multiplied=0|Coverage=1|None=2]
          rotation (284) = 0x1 (rotate-0) [rotate-0=0x1|rotate-180=0x4|reflect-x=0x10|reflect-y=0x20]
          COLOR_ENCODING (285) = 1 (ITU-R BT.709 YCbCr) [ITU-R BT.601 YCbCr=0|ITU-R BT.709 YCbCr=1|ITU-R BT.2020 YCbCr=2]
          COLOR_RANGE (286) = 0 (YCbCr limited range) [YCbCr limited range=0|YCbCr full range=1]
          CHROMA_SITING_H (287) = 0 [0 - 65536]
          CHROMA_SITING_V (288) = 0 [0 - 65536]
          zpos (289) = 17 [1 - 17]
        FB 294 64x64

I see a 64x64 plane that I guess may be the mouse pointer. Plane 140 is the Qt UI, which should be RGBA. Plane 130 are video frames, which should be YUV. Plane 79 is there even when my app is not running. Virtual terminal? Qt is set to 1920x1080, so I do not think it should require scaling. I explicitly set dest size as 1920x1080 in drmModeSetPlane() and the source is 1920x1080, so no resize should be needed. Thank you for your help.

carlonluca avatar Aug 09 '22 12:08 carlonluca

The framebuffer is on the primary plane.

If your application doesn't set DRM_CLIENT_CAP_UNIVERSAL_PLANES, then drmModeGetPlaneResources skips the primary and overlay planes - https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_plane.c#L663

Unless told to remove a plane, then they will continue to exist when your app runs and opens the DRM device. AIUI On exit, if your application allocated the buffer, then it will be freed and the plane disabled.

drmSetClientCap(fd, DRM_CLIENT_CAP_UNIVERSAL_PLANES, 1); before calling drmModeGetResources(fd) should allow you to use the primary plane, and in doing so reducing your rendering load.

YUV420 will always require the scaler for the chroma planes even if the luma plane is full screen. (There is also the quirky one with 4k video on a 1080p screen where the luma is scaled but the chroma isn't - I seem to recall the chroma is still passed through the scaler in that case).

We did have a discussion over whether there was a case to prune the display list of any totally obscured planes, but I don't think that ever got implemented. It gets complicated if not fullscreen, but even that simple case would probably have saved you here.

6by9 avatar Aug 09 '22 13:08 6by9

If you want to know explicitly what type a particular plane is, then you need to enumerate the plane props looking for the property type. 0=overlay, 1=primary, 2=cursor. See modetest for that sort of enumeration.

6by9 avatar Aug 09 '22 13:08 6by9

Hello, it seems that using plane 79 in drmModeSetPlane() reduces the load as you said to 217789440. Thanks!

But let me ask one more question on the topic: is the problem here related to the fact that there is one plane (79) which I do not need, or is the problem related specifically to the type of plane? Are the three plane types different in terms of performance or are those identical? So, in theory, should I always somehow remove plane 79 when I do not need it in an app using DRM/KMS? Should I set Qt to use a primary plane? Or is any overlay plane ok? Thanks.

carlonluca avatar Aug 09 '22 14:08 carlonluca

On the Pi, all planes are identical - you have a dlist entry per active plane, and the HVS just composes everything it is asked to. On other platforms it varies - many don't support any overlay planes, and even the cursor plane is optional. There are DRM drivers for the little SPI displays, and those will typically only expose a single primary plane.

I don't know the full history of why some planes are not advertised to clients by default - that seems a little odd to me, but will have been done for backwards compatibility at some point in the past. It's just annoying that unless you know the magic "password", the functionality is just hidden.

If you want full control of the display, then declare you understand UNIVERSAL_PLANES, and use the primary plane. It seems to be one of those slightly odd quirks of DRM.

6by9 avatar Aug 09 '22 15:08 6by9

If you don't use the primary plane yourself, it will still be present (as an obscured framebuffer console) and will still be using HVS cycles and sdram bandwidth.

RGBA32 1920x1080@60 is 480MB/s of sdram bandwidth which is pretty significant.

We've observed that when running an application that is sdram bandwidth constrained (like a software video decoder), the presence of each overlay like this costs about 10% of performance.

popcornmix avatar Aug 09 '22 15:08 popcornmix

Thanks for the information. I'm still a bit in doubt about how the setup should be. When the app starts, should I look for the active plane and somehow remove it (still cannot find how to do this)? Or should I look for that "used" primary plane and use it instead of using another one? It is apparently pretty important to do the right thing here, cause otherwise the second plane will simply be impossible to use in 1920x1080.

carlonluca avatar Aug 09 '22 17:08 carlonluca

There is only one primary plane. The console will only ever have been using the primary plane.

If you make sure you always use the primary plane then the console plane will not be continuing to use resources.

popcornmix avatar Aug 09 '22 18:08 popcornmix

Are you looking to run your app solely on a Pi, or on other platforms too?

AFAIK fbcon will only ever use the primary plane, and nothing else should be left active when your app runs. Using the primary plane therefore avoids any issues.

DispmanX effectively had the same restrictions, but it didn't enforce the limits, and there was a flag DISPMANX_FLAGS_ALPHA_DISCARD_LOWER_LAYERS that did as it says and threw out all layers below the one that had the flag.

Looking at the vc4 code, it already computes whether each plane fully covers the screen (although I think that crtc_w and crtc_h may be allowed to exceed state->crtc->mode.[h|v]display). It needs a little rejuggling as that flag gets set if it is transparent or doesn't cover the whole screen, but set a suitable flag there, and then https://github.com/raspberrypi/linux/blob/rpi-5.15.y/drivers/gpu/drm/vc4/vc4_hvs.c#L731 can reset dlist_next = dlist_start; should it find a layer which obscures all below it. (enable_bg_fill also inherently becomes 0).

There are a number of potential holes though, largely in allowing that through the hvs load code, and ensuring that if you then move a plane so that it doesn't cover the whole screen, does the dlist get fully recomputed and layers that are now exposed are added back in again.

6by9 avatar Aug 09 '22 18:08 6by9

There is only one primary plane. The console will only ever have been using the primary plane.

If you make sure you always use the primary plane then the console plane will not be continuing to use resources.

If this rule applies to all pi models, then it should be simple. At the moment, my only target device is the raspberry pi.

Anyhow, as it is not currently possible for the driver to discard covered planes automatically, would it be possible for the app to somehow "remove" it when unneeded? The UI will have to live in an overlay plane. When a video is supposed to be presented, I'll use the primary plane, as said. But during the time in which the app needs no video, and the UI covers the entire surface, is there a way to manually instruct the kernel to ignore that plane? Maybe setting some "null" value in drmModeSetPlane()? Hope I made myself clear. Thanks to both of you for the help.

carlonluca avatar Aug 09 '22 23:08 carlonluca

I have struggled with this same issue and only found two workarounds (at least when using the legacy DRM/KMS interfaces like drmSetModeCrtc and drmSetModeOverlay):

#1- Using drmSetModeCrtc() to change the main frame buffer to 16-bit seems to reduce the memory bandwidth overhead. Not sure if that would also reduce the renderer overhead.

#2- Direct manipulation of the display list pointer (updating the pointer to skip the first frame). The latter is tricky, non-portable, required the app have privileges, etc. Still, it is the only working solution I have found.

Though popcornmix seems to imply you can pass a primary framebuffer to drmSetModeOverlay (assuming DRM_CLIENT_CAP_UNIVERSAL_PLANES), I was unable to make that work (returns "invalid argument"). Possible that atomic mode might allow it.

vrazzer avatar Aug 10 '22 14:08 vrazzer

Thanks for the help. I still have to find a proper way to "disable" the primary plane, but for the rest I achieved the desired result. This is a demo of the difference between with the old omxplayer approach: https://youtu.be/LbT1RISBklk. I think a proper way to somehow "disable" the primary plane when not needed would be useful.

carlonluca avatar Aug 17 '22 13:08 carlonluca