picamera2 [HOW-TO] Avoid CMA heap fragmentation when using switch_mode

Using the Pi Camera v3 on a CM4, I need to record video with >20fps and occasionally capture images with full sensor resolution. This requires different sensor modes, so I have to switch in between. A short interruption in the video while taking the full res image is acceptable, but I try to keep it as short as possible. I need this to work indefinitely with thousands of mode switches without restarting the application.

Essentially, this works fine using switch_mode_and_capture_request(), interrupting the video for <0.5s. Unfortunately the application crashes after a few hundred switches with a memory allocation error. This is no surprise, as https://github.com/raspberrypi/picamera2/blob/6f9202b8eb1ea14c9db572377d778d40d422a846/picamera2/picamera2.py#L1403-L1408 warns of heap fragmentation. After reading this, I changed my code to switch_mode_capture_request_and_stop(), which, runs a little longer but crashes also with a memory allocation error.

Turning picamera2 debug logging on, I can see that after each mode switch new buffers are allocated. This means my application allocates and releases a lot of buffers, which seems to lead to heap fragmentation after some time.

Is there a way to avoid heap fragmentation with lots of mode switches?
@davidplowman If I would modify picamera2 to use a set of manually allocated buffers on start instead of allocating them internally, I could allocate the buffers for both modes once and never reallocate them. Could this be a solution for a lot of mode switching?
Should I expect any other problems with lots of mode switching besides cma heap fragmentation?

This is my minimal example which reproduces the issue:

from picamera2 import Picamera2
import subprocess

picam2 = Picamera2()

video_config = picam2.create_video_configuration(
    main={"size": (1920, 1080), "format": "RGB888"},
    lores={"size": (1920, 1080), "format": "YUV420"},
    raw={"size": (2304, 1296)},
    display=None
)
capture_config = picam2.create_still_configuration(
    main={"size": (4608, 2592), "format": "RGB888"},
    display=None
)

picam2.controls.FrameRate = 24.0
picam2.configure(video_config)
picam2.start()

switch_count = 0
while True:
    request = picam2.switch_mode_capture_request_and_stop(capture_config)
    buffer = request.make_buffer('main')
    metadata = request.get_metadata()
    request.release()
    picam2.configure(video_config)
    picam2.start()

    img = picam2.helpers.make_image(buffer, capture_config["main"])
    picam2.helpers.save(img, metadata, "file.jpg")

    switch_count += 1
    cma = subprocess.check_output("cat /proc/meminfo | grep CmaFree", shell=True)
    print(switch_count, cma)

The output is

[...]
85 b'CmaFree:          312796 kB\n'
[...]
374 b'CmaFree:          237928 kB\n'
[...]
564 b'CmaFree:          181640 kB\n'
[...]
1103 b'CmaFree:          100128 kB\n'
1104 b'CmaFree:          100128 kB\n'
1105 b'CmaFree:           87364 kB\n'
Traceback (most recent call last):
  File "test_switch.py", line 28, in <module>
    request = picam2.switch_mode_capture_request_and_stop(capture_config)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/picamera2/picamera2.py", line 1437, in switch_mode_capture_request_and_stop
    return self.dispatch_functions(functions, wait, signal_function, immediate=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/picamera2/picamera2.py", line 1304, in dispatch_functions
    return job.get_result() if wait else job
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/picamera2/job.py", line 79, in get_result
    return self._future.result()
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/lib/python3/dist-packages/picamera2/job.py", line 48, in execute
    done, result = self._functions[0]()
                   ^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/picamera2/picamera2.py", line 1354, in switch_mode_
    self.configure_(camera_config)
  File "/usr/lib/python3/dist-packages/picamera2/picamera2.py", line 1085, in configure_
    self.allocator.allocate(libcamera_config)
  File "/usr/lib/python3/dist-packages/picamera2/allocators/dmaallocator.py", line 43, in allocate
    fd = self.dmaHeap.alloc(f"picamera2-{i}", stream_config.frame_size)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/picamera2/dma_heap.py", line 98, in alloc
    ret = fcntl.ioctl(self.__dmaHeapHandle.get(), DMA_HEAP_IOCTL_ALLOC, alloc)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [Errno 12] Cannot allocate memory

Nov 17 '23 10:11 nzottmann

I think I found a solution. On every configure() call, the old buffers are removed and new buffers are allocated: https://github.com/raspberrypi/picamera2/blob/6f9202b8eb1ea14c9db572377d778d40d422a846/picamera2/picamera2.py#L1087

To avoid this, I created a PersistentAllocator which allocates only once after the first call to allocate():

from picamera2.allocators import DmaAllocator

class PersistentAllocator(DmaAllocator):
    allocated = False
    def allocate(self, libcamera_config):
        if not self.allocated:
            super().allocate(libcamera_config)
            self.allocated = True

I manually created two instances of this allocator. Between stoping the Picamera2 instance and switching mode, I manually switch the allocator. This switch is dispatched to the event loop to ensure the correct order. By using two allocators which allocate the buffers only once, I keep the same set of buffers for the two modes all the time and avoid CMA heap fragmentation by removing any reallocation.

Of course, this has to be done with special care as I have to ensure manually that the allocator matches the mode. Reallocation for example on resolution change has to be handled manually too.

Nov 21 '23 11:11 nzottmann

Hi, thanks for the update. I have been looking at this. The idea of being able to hold on to the buffers and reuse them is clearly good, and is something I've been wanting to do. But I'm also a bit confused as to why we're fragmenting or leaking memory. If you use the libcamera allocator rather than the dma heap one, it seems to run indefinitely. So it seems to me like there's something misbehaving even as things stand, though I don't currently know what it is.

Nov 21 '23 11:11 davidplowman

Thank you for the background info, I can give the LibcameraAllocator a try.

What is the reason for using DmaAllocator instead of LibcameraAllocator? https://github.com/raspberrypi/picamera2/blob/6f9202b8eb1ea14c9db572377d778d40d422a846/picamera2/picamera2.py#L274

Nov 21 '23 11:11 nzottmann

The DmaAllocator allows us to use cached memory buffers, which simply perform faster.

My suspicion is that the current implementation is actually leaking memory buffers from time to time, though I haven't got to the bottom of it yet.

Nov 21 '23 11:11 davidplowman

I think I see the problem. Let me do some testing and then I can post a PR to run through the CI tests.

Nov 21 '23 13:11 davidplowman

So I think this fixes the leak, which will hopefully make things work better.

But I still think it's a good idea to be able to allocate and hold onto buffers. Swapping the allocator object as you've done seems like a reasonable API to me, I guess some checking that the buffers are good for the configuration would be desirable. It would also be nice if that worked generically, whatever the underlying allocator. So perhaps some stuff to think about there.

Nov 21 '23 14:11 davidplowman

Thank you for the quick fix! I will run a test over night to check if it works.

But I also think I should avoid frequent reallocation at all if not necessary. A check if the buffers still match would be a nice improvement, I will add this. If one would implement this generically, what has to be checked? Buffer count, stream count and picture size per stream, anything else?

Nov 21 '23 15:11 nzottmann

I suppose I was thinking that it should work for any "allocator" (currently there are only the "LibcameraAllocator" and the "DmaAllocator"). Though TBH, I'm not sure why one would ever really want anything other than the DmaAllocator, so maybe it's not so important.

Nov 21 '23 15:11 davidplowman

So I think this fixes the leak, which will hopefully make things work better.

I can confirm this solves the issue I noticed which confirms it was a memory leak, not heap fragmentation as I supposed. Adding unseen_requests with a log output, but at first without the release() call shows, that every time the free memory decreases, len(unseen_requests) > 0. After implementing the whole fix, my minimal example runs forever.

Nov 22 '23 09:11 nzottmann

Great, thanks for the confirmation. We're doing another code release imminently, so I'll try to squeeze this one in.

Nov 22 '23 09:11 davidplowman

Swapping the allocator object as you've done seems like a reasonable API to me, I guess some checking that the buffers are good for the configuration would be desirable.

I implemented this idea and changed my PersistentAllocator to

class PersistentAllocator(DmaAllocator):
    def allocate(self, libcamera_config):
        buffer_layout_new = [[stream_config.buffer_count, stream_config.frame_size] for stream_config in list(libcamera_config)]
        buffer_layout_current = [[len(frame_buffers), frame_buffers[0].planes[0].length] for frame_buffers in self.frame_buffers.values()]
        if buffer_layout_new != buffer_layout_current:
            super().allocate(libcamera_config)

buffer_layout_new and buffer_layout_current are lists holding the buffer count and size for each stream, in my example they look like

[[6, 6220800], [6, 3110400], [6, 3732480]]
[[1, 35831808], [1, 14929920]]

Only when the buffer layout changes, the buffers are reallocated.

Nov 22 '23 11:11 nzottmann

I find your discussion very interesting as I am having the same problem. But I have two questions:

How can I use the LibcameraAllocator? It is exposed in picamera2.allocators. However, it seems to me that the following doesn't work

cam = Picamera2()
cam.allocator = LibcameraAllocator(cam)

How can I use the next branch (preferably without building anything)
I am trying to take a still image at 9152x6944 (~64MP), but it seems like that is too much for picamera2 to handle. Am I assuming correctly?

Feb 01 '24 10:02 theRealProHacker

picamera2
picamera2 copied to clipboard

[HOW-TO] Avoid CMA heap fragmentation when using switch_mode_* frequently

picamera2 picamera2 copied to clipboard

[HOW-TO] Avoid CMA heap fragmentation when using switch_mode_* frequently

picamera2
picamera2 copied to clipboard