mediapipe the live stream python api easily hangs (face landmarks)

trafficstars

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

Yes

OS Platform and Distribution

Ubuntu 22.04

MediaPipe Tasks SDK version

0.10.13

Task name (e.g. Image classification, Gesture recognition etc.)

face landmarker

Programming Language and version (e.g. C++, Python, Java)

python

Describe the actual behavior

hangs on detector.detect_async()

Describe the expected behaviour

should not hang even if the python callback is not fast, otherwise the live stream api is not useful for real python applications, as it requires doing any work with the received image on yet another thread, which kind of makes a distorted callbacks design for just the benefit of the backpressure.

Standalone code/steps you may have used to try to get what you need

The live stream api gets stuck if even a small amount of processing takes place in the registered python callback


from pathlib import Path

import cv2
import time
import mediapipe as mp

BaseOptions = mp.tasks.BaseOptions
FaceLandmarker = mp.tasks.vision.FaceLandmarker
FaceLandmarkerOptions = mp.tasks.vision.FaceLandmarkerOptions
FaceLandmarkerResult = mp.tasks.vision.FaceLandmarkerResult
VisionRunningMode = mp.tasks.vision.RunningMode

images_output_path = 'face landmarks images'
Path(images_output_path).mkdir(parents=True, exist_ok=True)

timestamps_ms = set()
written_image_number = 0

def handle_pipeline_prediction_callback(
    inference: FaceLandmarkerResult,
    output_image: mp.Image,
    timestamp_ms: int):
    time.sleep(0.05)

def main():

    # downloaded from 'https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/1/face_landmarker.task':
    model_path = 'models/face_landmarker.task'

    options = FaceLandmarkerOptions(
        base_options = BaseOptions(model_asset_path = model_path),
        running_mode = VisionRunningMode.LIVE_STREAM,
        result_callback = handle_pipeline_prediction_callback)

    detector = FaceLandmarker.create_from_options(options)

    stream = cv2.VideoCapture(0)
    # noinspection PyUnresolvedReferences
    stream.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter_fourcc(*'MJPG'))
    stream.set(cv2.CAP_PROP_FPS, 30)
    stream.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
    stream.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)
    stream.set(cv2.CAP_PROP_BUFFERSIZE, 1)

    if not stream.isOpened():
        raise Exception(f'failed opening camera')

    image_number = 0

    last_image_timestamp = None
    try:
        while True:
            stream.grab()
            success, image = stream.retrieve()
            if not success:
                raise Exception(f'failed retrieving camera image')

            image_timestamp = int(stream.get(cv2.CAP_PROP_POS_MSEC))
            if last_image_timestamp:
                if image_timestamp <= last_image_timestamp:
                    raise ValueError(f'camera image times are not monotonically increasing: {last_image_timestamp}, {image_timestamp}')
            last_image_timestamp = image_timestamp

            print(f'pushing image number {image_number} having image timestamp {image_timestamp} to mediapipe ...')
            detector.detect_async(
                image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image),
                timestamp_ms = image_timestamp)
            print(f'pushed')

            image_number += 1

    except KeyboardInterrupt as e:
        detector.close()
        print(f'\nexiting ...')

if __name__ == '__main__':
    main()

Eventually, and quite quickly, it hangs:

pushing image number 291 having image timestamp 54928231 to mediapipe ...
pushed
pushing image number 292 having image timestamp 54928263 to mediapipe ...
pushed
pushing image number 293 having image timestamp 54928295 to mediapipe ...
pushed
pushing image number 294 having image timestamp 54928331 to mediapipe ...
pushed
pushing image number 295 having image timestamp 54928363 to mediapipe ...

Jun 06 '24 21:06 matanox

Reason to fix this and the linked issue, is that back-pressure isn't useful if it breaks down when it's extended towards the python side, but honestly I'd rather use the non-live api, taking care of concurrency on my own thread and taking care of overall back-pressure on my own, in my specific application.

Jun 06 '24 22:06 matanox

Hi @matanox,

Apologies for the delayed response. Could you please let us know if this has been resolved on your end, or if you are still seeking a resolution?

Thank you!!

Jul 29 '24 09:07 kuaashish

I think it's just safe to avoid the python streaming api, at least, I see no great benefit to using it in python at present time. Will reopen if things change.

Jul 29 '24 16:07 matanox

Are you satisfied with the resolution of your issue? Yes No

Jul 29 '24 16:07 google-ml-butler[bot]

Hi @matanox I am also facing this problem. How did you set up your own concurrency so that you didn't have to use the LIVE_VIDEO option? When I try to switch to IMAGE mode and feeding in the images individually, I see the fps of my stream cut in half.

May 04 '25 01:05 mtm9ewy

I don't really recall seeing my rate cut in half. What rate are you seeing, and what is your hardware spec and camera model?

Since it's python and we have the GIL, you can do something like acquire the images on a thread, which will slash time you're waiting for the camera to respond (IO). Or switch to a language more naturally concurrent than python. Either way it may imply always running mediapipe inference over the previous frame having been acquired on a thread, so that this minimal concurrency allows camera IO wait times and inference CPU time to happen concurrently.

Much of (though not all) mediapipe processing actually as far as I recall releases the GIL, so the performance gain will also depend on the amount of your own processing that you do the results of the inference or anything else that you do on the loop, if you do.

Think this through, it can help.

May 04 '25 07:05 matanox

I've tried several computers/cameras so I don't think its hardware related. But what software set up should I have? Right now my code is split like this:

One thread is pulling frames off the webcam and putting them into a var with a threadlock. I also put a time.sleep in here
Main thread is an infinite loop which takes frames from the var, uses landmarker.detect() on it, and then puts through into my processing function

I get around 15fps, while on my previous LIVE_STREAM setup i would get 30fps but the cameras would crash. Any things I should change to increase my FPS?

May 05 '25 16:05 mtm9ewy

A few things you should note when driving it forward:

Using time.sleep() goes against any benefit from threading
Can't really guess what "cameras crashing" might mean, but try not to use the cheapest webcams of all
Of course performance is also hardware related, though it's not your first concern here most likely.
You should verify the speed the camera thinks it is working at, when your program is lauching, using whatever library api that you are using for camera acquisition, or something like OpenCV. Just to be on the safe side.

Given the first bullet, it looks like you should up your understanding of concurrency in python to get out of this hole. Try to redesign with no use of sleep as the first step ― you can use many resources, and something like ChatGPT to then learn more about ways to use concurrency in python after that ― as these matters are not specific to mediapipe.

Even the official example code somewhere out there should run faster than 15 FPS, you may want to verify that in parallel!

May 06 '25 18:05 matanox

mediapipe mediapipe copied to clipboard

the live stream python api easily hangs (face landmarks)

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

OS Platform and Distribution

MediaPipe Tasks SDK version

Task name (e.g. Image classification, Gesture recognition etc.)

Programming Language and version (e.g. C++, Python, Java)

Describe the actual behavior

Describe the expected behaviour

Standalone code/steps you may have used to try to get what you need

mediapipe
mediapipe copied to clipboard