firmware icon indicating copy to clipboard operation
firmware copied to clipboard

vc4_bo_create -> Failed to allocate from CMA.

Open DeviceIoControl opened this issue 6 years ago • 44 comments

Bug Description: Usage of any type of Hardware accelerated application (VLC media player, Kodi, Google Chrome, etc.) for an extended period of time (between 30 mins to 1 hour), causes CMA to fail to allocate memory (out of memory) for the vc4-fkms-v3d driver. This causes all of the Hardware accelerated applications running at that point in time to freeze (turn black) or produce serious visual artifacting.

The only fix that I have found, is to disable the vc4-fkms-v3d driver in the /boot/config.txt file (or by switching the GL Driver to "Legacy" mode in raspi-config), but unfortunately, this results in the loss of Hardware acceleration.

To reproduce: This typically occurs when watching a video for more than 30 mins in the VLC media player application. Eventually the video will freeze and any attempt to interact with the application will cause the application to freeze or produce visual artifacting.

Expected behaviour: Expected to use Hardware accelerated applications for an extended period of time, without causing applications freezing or produce visual artifacting.

Actual behaviour: See "Bug Description" section above.

System: Device Model -> Raspberry Pi 4 Model B (4GB) OS -> Raspbian GNU/Linux 10 (buster) armv7l Firmware version -> a51b488198a8c0360b93351682e7432d89d70411 Kernel version -> 4.19.66-v7l+

Logs: [3912.566190] [drm:vc4_bo_create [vc4]] ERROR Failed to allocate from CMA: [3912.573209] [drm] kernel: 5120kb BOs (1) [3912.579505] [drm] V3D: 15116kb BOs (15) [3912.585862] [drm] V3D shader: 120kb BOs (29) [3912.592241] [drm] dumb: 48kb BOs (3)

DeviceIoControl avatar Sep 07 '19 21:09 DeviceIoControl

Does increasing cma help? Add cma=512M to config.txt and test again. EDIT: cmdline.txt not config.txt

popcornmix avatar Sep 09 '19 12:09 popcornmix

OK, I've adjusted my settings, and I am testing now. Thanks. I'll comment again if the problem still persists!

DeviceIoControl avatar Sep 09 '19 12:09 DeviceIoControl

Does increasing cma help? Add cma=512M to config.txt and test again.

Well, it seems like the problem is still there, but its a different message this time. (still seems to be a CMA allocation error though.)

Here are the logs from dmesg.

dmesg: [ 1374.350936] cma: cma_alloc: alloc failed, req-size: 765 pages, ret: -16 [ 1374.350943] [vc_sm_cma_ioctl_alloc]: dma_alloc_coherent alloc of 3133440 bytes failed [ 1374.350947] [vc_sm_cma_ioctl_alloc]: something failed - cleanup. ret -12 [ 1375.019739] cma: cma_alloc: alloc failed, req-size: 765 pages, ret: -16 [ 1375.019750] [vc_sm_cma_ioctl_alloc]: dma_alloc_coherent alloc of 3133440 bytes failed [ 1375.019754] [vc_sm_cma_ioctl_alloc]: something failed - cleanup. ret -12

DeviceIoControl avatar Sep 09 '19 15:09 DeviceIoControl

What's the simplest way to reproduce? You say you can see the problem in 30 minutes in VLC. Is that just playing a single file?

e.g. with a clean raspbian buster image if you run vlc (and nothing else) and play a single video longer than 30m would you see this issue? Is there a freely available video file that exhibits this problem?

(this doesn't seem to be my experience - I've left VLC or chrome playing youtube videos overnight a number of times and it still seems happy in the morning - but perhaps there is something related to format a videos, or some other difference in our setups).

popcornmix avatar Sep 09 '19 15:09 popcornmix

In my case, it occurs more often when I've switched in and out of Fullscreen mode a couple of times in VLC Media player. (3 - 5 times).

And Google chrome seems to have this problem when viewing websites that use a lot of transitions and effects like: apple.com. (But most times Google chrome seems to "Aww, Snap" (crash) when using Hardware "intensive" websites).

Here is my config if needed.

**/boot/config.txt:**
# For more options and information see
# http://rpf.io/configtxt
# Some settings may impact device functionality. See link above for details

# uncomment if you get no picture on HDMI for a default "safe" mode
#hdmi_safe=1

# uncomment this if your display has a black border of unused pixels visible
# and your display can output without overscan
disable_overscan=1

# uncomment the following to adjust overscan. Use positive numbers if console
# goes off screen, and negative if there is too much border
#overscan_left=16
#overscan_right=16
#overscan_top=16
#overscan_bottom=16

# uncomment to force a console size. By default it will be display's size minus
# overscan.
#framebuffer_width=1280
#framebuffer_height=720

# uncomment if hdmi display is not detected and composite is being output
hdmi_force_hotplug=1

# uncomment to force a specific HDMI mode (this will force VGA)
hdmi_group=1
hdmi_mode=16

# uncomment to force a HDMI mode rather than DVI. This can make audio work in
# DMT (computer monitor) modes
#hdmi_drive=2

# uncomment to increase signal to HDMI, if you have interference, blanking, or
# no display
#config_hdmi_boost=4

# uncomment for composite PAL
#sdtv_mode=2

#uncomment to overclock the arm. 700 MHz is the default.
#arm_freq=800

# Uncomment some or all of these to enable the optional hardware interfaces
dtparam=i2c_arm=on
#dtparam=i2s=on
dtparam=spi=on

# Uncomment this to enable the lirc-rpi module
#dtoverlay=lirc-rpi

# Additional overlays and parameters are documented /boot/overlays/README

# Enable audio (loads snd_bcm2835)
dtparam=audio=on

[pi4]
# Enable DRM VC4 V3D driver on top of the dispmanx display stack
dtoverlay=vc4-fkms-v3d
max_framebuffers=1
cma=512m

[all]
# NOOBS Auto-generated Settings:
hdmi_force_hotplug=1
gpu_mem=256
#hdmi_enable_4kp60=1
start_x=1

DeviceIoControl avatar Sep 09 '19 15:09 DeviceIoControl

Ah sorry - cma=512m should be added to end of cmdline.txt, not config.txt.

popcornmix avatar Sep 09 '19 15:09 popcornmix

Alright, I've adjusted my settings again and I am testing now. Thanks. I'll report back if the issues persist.

DeviceIoControl avatar Sep 09 '19 15:09 DeviceIoControl

Ah sorry - cma=512m should be added to end of cmdline.txt, not config.txt.

The issue still persists (though it did take much longer to occur this time), the same CMA allocation error as mentioned in the previous comment.

It happened in VLC Media player, while watching a 1080p60 video (mp4). Specifically when trying to rewind 30 seconds back in the video.

DeviceIoControl avatar Sep 09 '19 20:09 DeviceIoControl

So, if you don't wind back, then the problem does not occur? Quite an important bit of information that.

JamesH65 avatar Sep 09 '19 20:09 JamesH65

Let us know if you find a sequence of operations that makes the issue happen repeatably and ideally quickly.

We can get the vlc/chromium guy to investigate this, but specific instructions to reproduce make it a lot more likely he'll be able to find and fix the problem.

What is your display resolution? Default skin on VLC?

popcornmix avatar Sep 09 '19 21:09 popcornmix

So, if you don't wind back, then the problem does not occur? Quite an important bit of information that.

It can still freeze even if you don't rewind VLC Media player.

Let us know if you find a sequence of operations that makes the issue happen repeatably and ideally quickly.

Will do! I will make sure to let you know as soon as possible, when I can do so.

EDIT: LOL, Found it! So, literally scrubbing the time-line on VLC Media player while the video is playing can cause this error. I am currently playing a 1920x1080p 60fps video (mp4). (My config has not changed since the last time I mentioned it btw.)

EDIT 2: So the "after-shock" of this error seems to cause other applications to crash when continuing to use them after that error has been produced.

In my case Chromium seems to "Aww, Snap" (crash) the tabs that are using Hardware accelerated rendering such as:

  • Tabs that are playing video (YouTube).
  • Tabs that use high-fidelity graphics / images (Apple.com)

NOTE: It doesn't crash it instantly, it only crashes when you try to interact with the website (after the error has been produced.) and will only stop crashing when you reboot the system (Restarting Chromium won't fix it).

EDIT 3: Making the Chromium tabs crash, seem to push more of the same error out when taking a look at the Kernel logs using "dmesg".

And I forgot to mention that VLC Media player becomes unresponsive when trying to close it. Closing the window will make it "minimise" to system tray but, attempts to close it completely by right-clicking the VLC Media player icon and clicking "Quit" doesn't work. (Even attempting to close it using the "kill -9 " command in the terminal doesn't work either.)

What is your display resolution? Default skin on VLC?

Raspberry Pi is running @ 1920x1080p 60fps

VLC is using the "default skin" but, I am running the LXDE Desktop environment not the default RPD desktop environment. (but this issue has occurred multiple times before I changed my Desktop environment).

DeviceIoControl avatar Sep 09 '19 22:09 DeviceIoControl

I seem to have a similar issue, but the context is a bit different. If this counts as hijacking: Sorry, please tell me to open a new issue in this case.

Context

We have a custom build UI that runs under wayland, or to be more exact under weston in fullscreen. The display has only a resolution of 720x480. The user interface renders its content using cairo (using the cairo-gl backend), pango (for font rendering) and librsvg (for displaying some vector graphics).

Issue

On some screens the UI suddenly freezes with very similar issues described by @DeviceIoControl. The application log just tells me Draw call returned invalid argument. expect corruption. The screen will simply not update anymore, even a restart of weston won't fix the problem. Only thing that helps is a hard reboot. It is also not 100% when the issue happens, screens that are more memory intensive (with some SVG graphics) seem to trigger it more often. Here are two dmesg logs of the problem:

  • https://pastebin.com/Pk7PQxHJ (using cma=128M@128M)
  • https://pastebin.com/y4aSrn0n (using cma=256M@256M)
  • (I'm currently testing with cma=512M@256M to see if it makes a difference)

Both logs contain also the kernel command line, plenty of stack traces and similar lines like:

Failed to allocate memory for tile binning: -12. You may need to enable CMA or give it more memory

and:

[drm:validate_tile_binning_config [vc4]] *ERROR* Failed to allocate binner memory: -12

System

  • OS: Custom Linux built with yocto (Linux raspberrypi-cm3 4.14.112 #1 SMP Fri Aug 9 13:13:52 UTC 2019 armv7l GNU/Linux)
  • Device Model: Compute Module 3+

Please tell me what other information I can provide to help to debug this.

sahib avatar Oct 01 '19 14:10 sahib

@sahib Your issue is very different. You would appear to be using the full vc4 3D driver (vc4-kms-v3d). The Pi4 has a different 3D block (v3d), and currently only supports the hybrid driver (vc4-fkms-v3d - note the "f"). Both cases would appear to be out of memory scenarios, but are going to be for different reasons.

4.14 is now very out of date, particularly with regard 3D and DRM/KMS driver changes. I would strongly recommend you update to 4.19, if not later. 3D also makes use of a userside library called Mesa, and I'd suggest you ensure that is relatively up to date (19.2.0 is now released and being used by Raspbian, although that is more for v3d fixes than vc4).

6by9 avatar Oct 01 '19 17:10 6by9

Thank you very much @6by9, I'm still trying to wrap my head around all of this.

Your issue is very different.

Sorry then about hijacking this issue. I will open a new issue once I have some new information.

Both cases would appear to be out of memory scenarios, but are going to be for different reasons.

Is increasing the amount of space for the CMA supposed to help? Also, can this be an issue in the application itself (i.e. using insane amounts of memory for unknown reasons) or is this more of a driver issue? I suppose the answer is "both"...

4.14 is now very out of date, particularly with regard 3D and DRM/KMS driver changes. I would strongly recommend you update to 4.19, if not later.

I can try and update once I get to it. Should be easily possible.

3D also makes use of a userside library called Mesa, and I'd suggest you ensure that is relatively up to date (19.2.0 is now released and being used by Raspbian, although that is more for v3d fixes than vc4).

This seems to be at 19.0.8. I can also try to update this.

sahib avatar Oct 01 '19 18:10 sahib

Update on this: I updated the kernel to 4.19.71 and set the CMA memory to 256M (might try lower later). I did not update mesa. So far I have not been able to reproduce the bug. Thanks @6by9 :+1:

If this crops up again, I will open a new issue.

sahib avatar Oct 08 '19 11:10 sahib

Hi, I'm running into the same problem when running a Doom3 sourceport (dhewm3 or d3wasm) on a RPi 3B, even with cma=512M in cmdline.txt (I also tried 256M). The problem is easily/quickly reproducible by just starting a new game. I'm running latest raspbian buster with Kernel 4.19.75-v7+.

To reproduce, first enable OpenGL in raspi-config and reboot as prompted (fake vs full KMS didn't make a difference). It might also make sense to enable the SSH server so you can log into the half-frozen RPi later.

Then build latest dhewm3 git:

  1. sudo apt install build-essentials git cmake libsdl2-dev libopenal-dev libjpeg8-dev libvorbis-dev
  2. git clone https://github.com/dhewm/dhewm3.git
  3. cd dhewm3 && mkdir build && cd build
  4. cmake ../neo/
  5. make -j4

Now you should have a dhewm3 executable in dhewm3/build/. Now get the (free demo) game data:

  1. In the same directory your dhewm3/ directory is in, create a doom3data/ directory
  2. cd doom3data
  3. wget https://files.holarse-linuxgaming.de/native/Spiele/Doom%203/Demo/doom3-linux-1.1.1286-demo.x86.run (you can also download the same file from another mirror if you like, it's the official Doom3 Linux x86 demo from back in the day)
  4. sh doom3-linux-1.1.1286-demo.x86.run --tar xf demo/ (this unpacks the game data that's in the .run file somewhere - you should get a demo/ directory that contains just one demo00.pk4 file)
  5. mv demo base (rename the directory to base)

So now your original top dir (probably /home/raspberrypi/ by default) should contain the following stuff:

dhewm3/
dhewm3/build/
dhewm3/build/dhewm3
dhewm3/build/.... (more files, incl. base.so)
dhewm3/neo/
dhewm3/README.md/
dhewm3/.... (more stuff)
doom3data/
doom3data/base/
doom3data/base/demo00.pk4
.... (whatever else was already there)

Now run dhewm3, you should get to see the main menu, where we'll do some configuration before restarting the game:

  1. cd dhewm3/build
  2. ./dhewm3 +set fs_basepath ../../doom3data/ starts dhewm3 and tells it where to find the game data
  3. game should start and you should get to see the main menu (if it's super slow it's not using the proper OpenGL driver but llvmpipe => enable OpenGL in raspi-config!)
  4. Click Options -> System -> Low Quality
  5. In the same menu, select "Advanced Options" and set everything to "No" or "Off"
  6. Click "Close Advanced Options", then click "Apply Changes", then "Exit" (on lower right corner)

Now Doom3/dhewm3 is configured to run in the lowest possible settings, and it should have written its config so it remembers those settings after a later crash.

Now you can finally run the game:

  1. Again, ./dhewm3 +set fs_basepath ../../doom3data/ starts dhewm3
  2. In the main menu, select New Game -> Recruit and wait
  3. This will take a bit because the RPi is pretty slow, but eventually the progress bar will stop moving all of X11 will freeze, but you'll still be able to move the mouse pointer (but not do anything with it). Yes, this is a bit confusing
  4. If you ssh into the machine, you'll see a lot of "Failed to allocate from CMA" messages in the syslog and dmesg.
  5. I think that only a full reboot will make OpenGL usable again (just restarting X didn't work for me)

An excerpt from my dmesg:

[   20.911620] fuse init (API version 7.27) # the last "normal", old line from boot
[  253.900107] [drm:vc4_bo_create [vc4]] *ERROR* Failed to allocate from CMA:
[  253.900122] [drm]                            V3D: 488460kb BOs (3503)
[  253.900126] [drm]                     V3D shader:    260kb BOs (64)
[  253.900130] [drm]                           dumb:   9016kb BOs (2)
[  253.900136] [drm]                total purged BO:    712kb BOs (8)
[  253.900149] vc4_v3d 3fc00000.v3d: Failed to allocate memory for tile binning: -12. You may need to enable CMA or give it more memory.
[  254.917700] [drm:vc4_bo_create [vc4]] *ERROR* Failed to allocate from CMA:
[  254.917715] [drm]                            V3D: 498188kb BOs (3603)
[  254.917719] [drm]                     V3D shader:    260kb BOs (64)
[  254.917723] [drm]                           dumb:   9016kb BOs (2)
[  254.917727] [drm]                total purged BO:   1636kb BOs (22)
[  254.918211] [drm:vc4_bo_create [vc4]] *ERROR* Failed to allocate from CMA:
[  254.918215] [drm]                            V3D: 497892kb BOs (3598)
[  254.918219] [drm]                     V3D shader:    260kb BOs (64)
[  254.918222] [drm]                           dumb:   9016kb BOs (2)
[  254.918226] [drm]                total purged BO:   1636kb BOs (22)
[  254.918685] [drm:vc4_bo_create [vc4]] *ERROR* Failed to allocate from CMA:
[  254.918689] [drm]                            V3D: 497892kb BOs (3598)
[  254.918693] [drm]                     V3D shader:    260kb BOs (64)
[  254.918696] [drm]                           dumb:   9016kb BOs (2)
[  254.918700] [drm]                total purged BO:   1636kb BOs (22)
... (it goes on like this forever)

Basically the same happens when using d3wasm (https://github.com/gabrielcuvillier/d3wasm/), which is based on dhewm3 but has a new renderer, instead of dhewm3 (https://github.com/dhewm/dhewm3/). Even though it's not obvious, it's possible to build d3wasm as a normal Linux binary that will use OpenGL ES 2.0 (Vanilla Doom3 and dhewm3 use OpenGL 1.x with ARB shaders). If you wanna try that as well, https://github.com/gabrielcuvillier/d3wasm/blob/master/BUILD.md#6-enjoy describes how to the native d3wasm build works; you can start it just like dhewm3 with +set fs_basepath ../../doom3data/ to tell it where to find the game data.

DanielGibson avatar Jan 04 '20 05:01 DanielGibson

Happens with Kodi as well. Leaving its media player on pause for ~30 min, or just inactive for long time (anywhere, including main interface) Raspbian buster.

fierevere avatar Feb 07 '20 15:02 fierevere

Got the same issue twice today. Pi 3 been updated today (before the issues).

[ 4191.445808] [drm:vc4_bo_create [vc4]] ERROR Failed to allocate from CMA: [ 4191.445852] [drm] V3D: 126112kb BOs (389) [ 4191.445860] [drm] V3D shader: 476kb BOs (117) [ 4191.445867] [drm] dumb: 8116kb BOs (2) [ 4191.445879] [drm] total purged BO: 1544kb BOs (7) [ 4191.445897] vc4_v3d 3fc00000.v3d: Failed to allocate memory for tile binning: -12. You may need to enable CMA or give it more memory.

Did not do anything particularly strange, other than some browsing, terminal, cmake + emacs After the 1st occurrence I updated the gpu memory to 128 and my config is as follows

dtparam=audio=on

[pi4] dtoverlay=vc4-fkms-v3d max_framebuffers=2

[all] dtoverlay=vc4-fkms-v3d gpu_mem=128

audetto avatar Feb 08 '20 13:02 audetto

Set this in /boot/config.txt, seems to help, so far no problems for a day. plus yesterdays Raspbian update. (kernel 4.19.97-v7+)

max_framebuffers=1 gpu_mem=128 cma_lwm=16 cma_hwm=256

fierevere avatar Feb 08 '20 14:02 fierevere

Not sure what max_framebuffers does, but I think on a pi3 it defaults to 1??? Will try and report back.

audetto avatar Feb 08 '20 19:02 audetto

default is 2 possible values 1 or 2

fierevere avatar Feb 08 '20 19:02 fierevere

Not sure what max_framebuffers does, but I think on a pi3 it defaults to 1??? Will try and report back.

Simply limits the number of displays that will be instantiated. So if you set it to one on a Pi4 you only get one HDMI port, which can save some memory. TBH, setting to 2 is fine for almost all use cases., even on devices prior to the 4, pre-KMS, since frame buffers are only created if displays are found.

JamesH65 avatar Feb 09 '20 11:02 JamesH65

Set this in /boot/config.txt, seems to help, so far no problems for a day. max_framebuffers=1 gpu_mem=128 cma_lwm=16 cma_hwm=256

cma_lwm/cma_hwm were removed from firmware over two years ago. max_framebuffers=1 is the default if otherwise not specified in config.txt. Did you add it as a new entry or edit an existing entry of max_framebuffers=2? I can't imagine setting gpu_mem=128 (higher than the default) will help this issue (as effectively the arm will have less memory available).

Identifying the exact line you think helped would be useful (note: it is definitely not cma_lwm/cma_hwm which don't exist).

popcornmix avatar Feb 10 '20 12:02 popcornmix

havent "crashed" yet since that change. Was about several times daily. Maybe this is because of changes. Maybe because of raspbian kernel update (i wonder what they changed, havent seen changelog).

fierevere avatar Feb 10 '20 13:02 fierevere

If you want to help narrow down this issue, then remove the lines one at a time and see if things are still stable after a day or two. Report back if removing any line had an obvious effect on stability. If it's still stable after the lines are removed, then that is also useful info (presumably the issue has been resolved in a kernel update).

popcornmix avatar Feb 10 '20 14:02 popcornmix

It is definitely not fixed. I was just using LibreOffice + Chrome and it happened (pi 3 fully updated)

I will try to revert back to the legacy driver and see if it still happens.

audetto avatar Feb 11 '20 13:02 audetto

Is there anything else that can be done to track down / solve this? Some more logging? debug info enabled?

The legacy driver clearly does not crash, but it has the bad habit of not sending my monitor to sleep for instance, so I really miss vc4. I can't believe no-one else is seeing this.

audetto avatar Feb 18 '20 21:02 audetto

Isn't this workaround for this issue? https://www.raspberrypi.org/forums/viewtopic.php?t=223363#p1614476

Our tests are currently running so I don't know if it really solve this problem but it seems promising.

j123b567 avatar Apr 02 '20 16:04 j123b567

I can confirm that the workaround works for dhewm3. With /sys/devices/platform/soc/*.v3d/power/control set to on I can start the game and load the first level.

Yamagi avatar Apr 05 '20 16:04 Yamagi

Should this be applied automatically by raspi-config when the OpenGL module is selected?

audetto avatar Apr 05 '20 17:04 audetto