[SUGGESTION] Any plans to implement the picamera "annotate" functionality?

Open jaysonlarose opened this issue 3 years ago • 0 comments

The picamera library had support for "baking in" text to the output image via firmware using the annotate_text property. Are there any plans to reimplement this ability in picamera2?

I use this functionality — as well as a little brute force and ignorance — to apply timestamps with 1/100th second precision to video. I'd basically push the updated timestamp text to the camera module 100 times per second as part of my main event loop, and let the chips fall where they may as far as which frames received what timestamp.

I tried to reimplement the same thing using a technique similar to this last night on the same Raspberry Pi 3 Model B, and CPU usage exceeded 100% and capture at 25+ frames per second was no longer possible.

One caveat with the above statement: I had to implement the code that writes the timestamp text to the current video frame by rendering via freetype and blitting via numpy, because raspberry pi os lite bullseye's gtk-3 is currently broken. Until that's fixed I won't know how performance of cv2's text rendering compares, but I'm concerned about the performance, as my example code running at 1280x720@30fps produced 5 minutes 5 seconds worth of frames during 10 minutes worth of recording and consumed 110% CPU while doing it; in contrast, the old annotate method worked fine at 1600x1200@30fps, consuming 15% cpu while also serving up the video stream using TLS encryption.

Here's the example code I cobbled together that uses freetype while the gtk3 packages are broken:

#!/usr/bin/python3
import time

# cv2 disabled because:
#$ date
#Tue 11 Oct 2022 10:43:46 AM PDT

#$ printf "%s\n" "$( cat /proc/device-tree/model )"
#-bash: warning: command substitution: ignored null byte in input
#Raspberry Pi 3 Model B Rev 1.2

#$ lsb_release -a
#No LSB modules are available.
#Distributor ID: Raspbian
#Description:    Raspbian GNU/Linux 11 (bullseye)
#Release:        11
#Codename:       bullseye

#$ sudo apt install python3-opencv
#Reading package lists... Done
#Building dependency tree... Done
#Reading state information... Done
#Some packages could not be installed. This may mean that you have
#requested an impossible situation or if you are using the unstable
#distribution that some required packages have not yet been created
#or been moved out of Incoming.
#The following information may help to resolve the situation:
#
#The following packages have unmet dependencies:
# libgtk-3-0 : Depends: libwayland-client0 (>= 1.20.0) but 1.18.0-2~exp1.1 is to be installed
#E: Unable to correct problems, you have held broken packages.

#import cv2

import os

# NOTE: point this to a .ttf file that actually exists on your system!
FONT_FILE = os.path.join(os.environ['HOME'], ".local/share/fonts/alarm_clock_7mod.ttf")
FONT_SIZE = 16


from picamera2 import MappedArray, Picamera2
from picamera2.encoders import H264Encoder
from picamera2.outputs import FileOutput

# (blit() and paste() lifted from https://stackoverflow.com/questions/28676187/numpy-blit-copy-part-of-an-array-to-another-one-with-a-different-size)

import numpy as np

def blit(a, b, offsets=(0,), as_shapes=False):
    """
    Computes the slices of the overlapping regions of arrays <a> and <b>. If offsets are specified,
    <b> will be shifted by these offsets before computing the overlap.

    Example:
          50
       ┌──────┐
       │      │
     65│  ┌───┼────┐
       │  │   │    │50
       └──┼───┘    │
          └────────┘
              55
    <a> is the 65x50 array and <b> is the 50x55 array. The offsets are (32, 18). The returned
    slices are [32:65, 18:50] for <a> and [0:33, 0:32] for <b>.

    Arrays of different dimensions can be used (e.g. 3-dimensional RGB image and 2-dimensional
    grayscale image) but the slices will only go up to min(a.ndim, b.ndim). An offset with more
    elements than that will throw a ValueException.

    Instead of arrays, shapes can be directly passed to the function by setting as_shapes to True.

    :param a: an array object or a tuple is as_shape is True
    :param b: an array object or a tuple is as_shape is True
    :param offsets: a sequence of offsets
    :param as_shapes: if True, <a> and <b> are expected to be array shapes rather than array
    :return: a multidimensional slice for <a> followed by a multidimensional slice for <b>
    """

    # Retrieve and check the array shapes and offset
    if not as_shapes:
        a, b = np.array(a, copy=False), np.array(b, copy=False)
        a_shape, b_shape = a.shape, b.shape
    else:
        a_shape, b_shape = a, b
    n = min(len(a_shape), len(b_shape))
    if n == 0:
        raise ValueError("Cannot overlap with an empty array")
    offsets = tuple(offsets) + (0,) * (n - len(offsets))
    if len(offsets) > n:
        raise ValueError("Offset has more elements than either number of dimensions of the arrays")

    # Compute the slices
    a_slices, b_slices = [], []
    for i, (a_size, b_size, offset) in enumerate(zip(a_shape, b_shape, offsets)):
        a_min = max(0, offset)
        a_max = min(a_size, max(b_size + offset, 0))
        b_min = max(0, -offset)
        b_max = min(b_size, max(a_size - offset, 0))
        a_slices.append(slice(a_min, a_max))
        b_slices.append(slice(b_min, b_max))

    return tuple(a_slices), tuple(b_slices)

def paste(a, b, offsets=(0,), copy=True):
    """
    Pastes array <b> into array <a> at position <offsets>

    :param a: an array object
    :param b: an array object
    :param offsets: the position in <a> at which <b> is to be pasted
    :param copy: whether to paste <b> in <a> or in a copy of <a>
    :return: either <a> or a copy of <a> with <b> pasted on it
    """

    out = np.array(a, copy=copy)
    a_slice, b_slice = blit(a, b, offsets)
    out[a_slice] = b[b_slice]
    return out


import numpy
import freetype

def render_numpy(face, text, grayscale=True):
        flags = freetype.FT_LOAD_RENDER
        if not grayscale:
                flags |= freetype.FT_LOAD_TARGET_MONO
        pen = freetype.FT_Vector(0, 0)
        xmin, xmax = 0, 0
        ymin, ymax = 0, 0
        # Previous character, used for kerning
        previous = 0
        for char in text:
                face.load_char(char, flags)
                kerning = face.get_kerning(previous, char)
                previous = char
                pen.x += kerning.x
                x0 = (pen.x >> 6) + face.glyph.bitmap_left
                x1 = x0 + face.glyph.bitmap.width
                y0 = (pen.y >> 6) - (face.glyph.bitmap.rows - face.glyph.bitmap_top)
                y1 = y0 + face.glyph.bitmap.rows
                xmin, xmax = min(xmin, x0), max(xmax, x1)
                ymin, ymax = min(ymin, y0), max(ymax, y1)
                pen.x += face.glyph.advance.x
                pen.y += face.glyph.advance.y

        canvas = numpy.zeros((ymax - ymin, xmax - xmin), dtype=numpy.ubyte)
        previous = 0
        pen.x, pen.y = (0, 0)
        for char in text:
                face.load_char(char, flags)
                kerning = face.get_kerning(previous, char)
                previous = char
                pen.x += kerning.x
                x = (pen.x >> 6) - xmin + face.glyph.bitmap_left
                y = (pen.y >> 6) - ymin - (face.glyph.bitmap.rows - face.glyph.bitmap_top)
                data = []
                for i in range(face.glyph.bitmap.rows):
                        if not grayscale:
                                row = []
                                for j in range(face.glyph.bitmap.pitch):
                                        row.extend(bits(face.glyph.bitmap.buffer[i * face.glyph.bitmap.pitch + j]))
                                data.extend(row[:face.glyph.bitmap.width])
                        else:
                                data.extend(face.glyph.bitmap.buffer[i * face.glyph.bitmap.pitch:i * face.glyph.bitmap.pitch + face.glyph.bitmap.width])
                if len(data):
                        Z = numpy.array(data, dtype=numpy.ubyte).reshape(face.glyph.bitmap.rows, face.glyph.bitmap.width)
                        canvas[y:y + face.glyph.bitmap.rows, x:x + face.glyph.bitmap.width] |= Z[::-1, ::1]
                pen.x += face.glyph.advance.x
                pen.y += face.glyph.advance.y
        canvas = numpy.flip(canvas, 0)
        canvas = numpy.repeat(canvas.reshape(-1), 3).reshape(*canvas.shape, 3)
        return canvas

def format_timestamp(dt, omit_tz=False, alt_tz=False, precision=6):# {{{
        # doc {{{
        """\
        Takes a timezone-aware datetime object and makes it look like:

        2019-01-21 14:38:21.123456 PST

        Or, if you call it with omit_tz=True:

        2019-01-21 14:38:21.123456

        The precision parameter controls how many digits past the decimal point you
        get. 6 gives you all the microseconds, 0 avoids the decimal point altogether
        and you just get whole seconds.
        """
        # }}}
        tz_format = "%Z"
        if alt_tz:
                tz_format = "%z"
        timestamp_txt = dt.strftime("%F %T")
        if precision > 0:
                timestamp_txt = "{}.{}".format(timestamp_txt, "{:06d}".format(dt.microsecond)[:precision])
        if not omit_tz and dt.tzinfo is not None:
                timestamp_txt = "{} {}".format(timestamp_txt, dt.strftime("%z"))
        return timestamp_txt
# }}}

def now_tzaware():# {{{
        # doc {{{
        """
        Convenience function, equivalent to
        `datetime.datetime.now(tz=pytz.reference.Local)`
        """
        # }}}
        import pytz.reference, datetime
        return datetime.datetime.now(tz=pytz.reference.Local)
# }}}



if __name__ == '__main__':
    picam2 = Picamera2()
    picam2.configure(picam2.create_video_configuration())

    #colour = (0, 255, 0)
    #origin = (0, 30)
    #font = cv2.FONT_HERSHEY_SIMPLEX
    #scale = 1
    #thickness = 2

    face = freetype.Face(FONT_FILE)
    face.set_char_size(FONT_SIZE * 64)


    def apply_timestamp(request):
        timestamp_text = format_timestamp(now_tzaware(), precision=2)
        canvas = render_numpy(face, timestamp_text)
        timestamp = time.strftime("%Y-%m-%d %X")
        with MappedArray(request, "main") as m:
            #cv2.putText(m.array, timestamp, origin, font, scale, colour, thickness)
            paste(m.array, canvas, (10, 10), copy=False)


    picam2.pre_callback = apply_timestamp

    encoder = H264Encoder(10000000)

    picam2.start_recording(encoder, "test.h264")
    time.sleep(600)
    picam2.stop_recording()

I assume a Raspberry Pi 4 is performant enough to run this in real-time, but the Raspberry Pi 3 is not, so I'm concerned about the claim that the legacy interface is going to be removed while the new library isn't capable of doing the same thing.

Oct 11 '22 19:10 jaysonlarose