mediapipe icon indicating copy to clipboard operation
mediapipe copied to clipboard

Pose Landmarker Jittering

Open scottxp opened this issue 2 years ago • 38 comments
trafficstars

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

No

OS Platform and Distribution

Mac OS X 13.0.1

MediaPipe Tasks SDK version

0.10.0

Task name (e.g. Image classification, Gesture recognition etc.)

Pose Landmark Detection

Programming Language and version (e.g. C++, Python, Java)

Javascript

Describe the actual behavior

I have switched over from the legacy mediapipe library to the new mediapipe solutions. The landmarks are jittering more than expected when I use the VIDEO runningMode on GPU or CPU with any of the pose_landmarker tasks.

Describe the expected behaviour

The legacy mediapipe pose estimation detection offered a smoothing parameter (smoothLandmarks) to reduce the jittering, which worked quite well. I have not been able to find this option in the new mediapipe solutions library.

Standalone code/steps you may have used to try to get what you need

The jittering can be observed on the official mediapipe solutions demo page:

https://mediapipe-studio.webapps.google.com/demo/pose_landmarker

Other info / Complete Logs

Here is my sample code to instantiate the PoseLandmarker:

const vision = await FilesetResolver.forVisionTasks(
    // path/to/wasm/root
    "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
  );
  window.poseLandmarker = await PoseLandmarker.createFromOptions(
    vision,
    {
      baseOptions: {
        modelAssetPath: `/models/pose_landmarker_${model_type}.task`,
        delegate: "GPU",
      },
      runningMode: "VIDEO"
    }
  );

scottxp avatar Jun 02 '23 17:06 scottxp

@scottxp,

Could you please elaborate your query with complete details and if you can share any captured jitter image to understand the issue better?

kuaashish avatar Jun 13 '23 09:06 kuaashish

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

github-actions[bot] avatar Jun 21 '23 01:06 github-actions[bot]

I am encountering a similar issue in Python. It seems that there is no landmarks smoothing in the mediapipe v0.10.1 In the following two videos we can see the effect. The video that uses mediapipe 0.10.1 the landmarks jitter a lot more than in the second video, which uses mediapipe 0.8.11.

The videos are basically a single image that is fed to the mediapipe solution and task in a loop. The same effect also happens when using webcam.

https://github.com/google/mediapipe/assets/16905449/02381e1a-0514-41a9-81e0-20f6a6eeeced

https://github.com/google/mediapipe/assets/16905449/2310a792-dfd0-450d-a9ec-689ba28f7682

My assumption is that the difference lies in the graph that is used in pose estimation. In version 0.8.11 it uses the following graph that has a smoothing calculator: https://github.com/google/mediapipe/blob/release/mediapipe/modules/pose_landmark/pose_landmark_cpu.pbtxt#L216

While in 0.10.1 it builds a different graph that basically contains PoseLandmarkerGraph calculator and a FlowLimiterCalculator. The PoseLandmarkerGraph calculator consists of two sub graphs, that I assume don't have any smoothing calculator in them. https://github.com/google/mediapipe/blob/91a3c54d558af8c4a0807d2bdd47e875a3c1e87a/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker_graph.cc#L219

Maybe it would be possible to enhance the graph that is created in mediapipe 0.10.1, and add the smoothing calculator. I will try doing that, but I'm not sure if the input and output streams will be compatible.

igorbasko01 avatar Jun 21 '23 10:06 igorbasko01

https://github.com/google/mediapipe/assets/879510/cd51f61e-5f96-48a5-9685-f6e04bdcf435

LEFT: mediapipe/[email protected] RIGHT: mediapipe/[email protected]/pose.js

As described by @igorbasko01, there does not appear to be any landmark smoothing in mediapipe 0.10.1. You can see the jittering landmarks in the video on the left while the video on the right does not jitter. These were captured and processed simultaneously using the same webcam stream but different mediapipe libraries.

Here is the code for the video on the LEFT:

const vision = await FilesetResolver.forVisionTasks(
  "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
);
window.poseLandmarker = await PoseLandmarker.createFromOptions(
  vision,
  {
    baseOptions: {
      modelAssetPath: "/models/pose_landmarker_full.task",
      delegate: "GPU",
    },
    runningMode: "VIDEO"
  }
);

And here is the code for the video on the RIGHT:

window.poseDetector = await poseDetection.createDetector("BlazePose", {
      runtime: "tfjs",
      enableSmoothing: true,
      modelType: "full",
      solutionPath = 'https://cdn.jsdelivr.net/npm/@mediapipe/pose'
});

scottxp avatar Jun 22 '23 19:06 scottxp

I have attempted to implement a Python version of the OneEuroFilter, closely modeled after the C++ version found at this mediapipe implementation: https://github.com/google/mediapipe/blob/bed624f3b6f7ad5d25b5474c516561c537f10199/mediapipe/util/filtering/one_euro_filter.cc#L14

I've also replicated the same parameters for this Python OneEuroFilter, including setting the frequency to 30, which corresponds to the number of frames per second (FPS). I used the parameters that can be seen here: https://github.com/google/mediapipe/blob/c8c5f3d062f441eb37738c789a3550e7280ebefe/mediapipe/modules/pose_landmark/pose_landmark_filtering.pbtxt#L115

During the callback, I apply this filter to the Normalized Landmarks of the PoseLandmarkerResult. Notably, I created a distinct filter for each axis of each landmark.

Unfortunately, the jittering issue seems to persist, and I'm unable to observe any significant improvements.

For a closer look, you can find my Python filter implementation and usage in this gist: https://gist.github.com/igorbasko01/c51980df0ce9a516c8bcc4ff8e039eb7

I would greatly appreciate any help in addressing this issue or any advice on potential workarounds.

igorbasko01 avatar Jun 27 '23 15:06 igorbasko01

Pose landmark smoothing is not implemented yet according to the C++ Source Code: https://github.com/google/mediapipe/blob/df3f4167aed857c891395b4bab851a8a4f8024f8/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker_graph.cc#L312

That being said, there is a new MultiWorldLandmarksSmoothingCalculator in the code, it's just not used anywhere yet: https://github.com/google/mediapipe/blob/df3f4167aed857c891395b4bab851a8a4f8024f8/mediapipe/calculators/util/multi_world_landmarks_smoothing_calculator.h#L57

Assuming it's functional, it might be possible to plug in the landmark outputs from the pose landmarker graph into it to get the smoothed landmarks, at least in C++.

// Edit: After some more testing I can confirm that using the calculator for smoothing works with the "one euro" filter!

Silverlan avatar Jul 10 '23 11:07 Silverlan

Hey @Silverlan Thanks for the suggestion ! Can you please elaborate a bit more on what exactly did you do, and how did you add the MultiWorldLandmarksSmoothingCalculator.

igor-basko avatar Jul 11 '23 11:07 igor-basko

Hey @Silverlan Thanks for the suggestion ! Can you please elaborate a bit more on what exactly did you do, and how did you add the MultiWorldLandmarksSmoothingCalculator.

I use the C++ API. I can describe my steps, but I don't know the approach for the other APIs.

  1. Added //mediapipe/calculators/util:multi_world_landmarks_smoothing_calculator as dependency to my project so I can use the MultiWorldLandmarksSmoothingCalculator calculator.
  2. Added the MultiWorldLandmarksSmoothingCalculator calculator to my graph with the one_euro filter (velocity filter did not work for me):
{
auto& smoothCalculator = graph.AddNode(
"MultiWorldLandmarksSmoothingCalculator");
auto* options = &smoothCalculator.GetOptions<mediapipe::LandmarksSmoothingCalculatorOptions>();

auto* filter = options->mutable_one_euro_filter();
filter->set_beta(smoothingFilterSettings.beta);
filter->set_disable_value_scaling(smoothingFilterSettings.disableValueScaling);
filter->set_frequency(smoothingFilterSettings.frequency);
filter->set_min_cutoff(smoothingFilterSettings.minCutoff);
filter->set_derivate_cutoff(smoothingFilterSettings.derivateCutoff);
filter->set_min_allowed_object_scale(smoothingFilterSettings.minAllowedObjectScale);

worldLandmarks >>
smoothCalculator.In("LANDMARKS");
trackingIdsInput >>
smoothCalculator.In("TRACKING_IDS");
smoothCalculator.Out("FILTERED_LANDMARKS").SetName(outputName) >>
graph[::mediapipe::api2::Output< std::vector<mediapipe::LandmarkList>>(graphOutputName)];
}

Make sure to add this node after the PoseLandmarkerGraph (or HandLandmarkerGraph) node. Then use the WORLD_LANDMARKS output of the PoseLandmarkerGraph for the LANDMARKS input of the MultiWorldLandmarksSmoothingCalculator node.

  1. For the TRACKING_IDS input you have to create a std::vector<int64_t> with the exact same size as the number of poses. I just have one pose, so I just initialized it with std::vector<int64_t> trackingIds {0}, then you can use that as input:
std::vector<int64_t> trackingIds = { 0 };
auto packetTrackingIds = mediapipe::MakePacket<std::vector<int64_t>>(trackingIds);
  1. The one_euro filter properties are critical, with the default settings I didn't notice any reduction in jitter at all. The values below worked for me:
smoothingFilterSettings.beta = 10.0
smoothingFilterSettings.minCutoff = 0.05
smoothingFilterSettings.derivateCutoff = 1
smoothingFilterSettings.disableValueScaling = false
smoothingFilterSettings.frequency = 30.0
smoothingFilterSettings.minAllowedObjectScale = 1e-06

You'll probably have to tweak them and play around with them though.

  1. It won't work without this step: You have to set a timestamp for all input packets:
auto msTime = cap.get(cv::CAP_PROP_POS_MSEC); // Time in miliseconds
auto mcTime = msTime *1000.f; // Time in microseconds

auto packetImg = mediapipe::MakePacket<mediapipe::Image>(*image);
packetImg = packetImg.At(mediapipe::Timestamp(mcTime));

auto packetArea = mediapipe::MakePacket<mediapipe::NormalizedRect>(MakeNormRect(0.5, 0.5, 1.0, 1.0, 0));
packetArea = packetArea.At(mediapipe::Timestamp(mcTime));

std::vector<int64_t> trackingIds = { 0 };
auto packetTrackingIds = mediapipe::MakePacket<std::vector<int64_t>>(trackingIds);
packetTrackingIds = packetTrackingIds.At(mediapipe::Timestamp(mcTime));

auto outputPackets = taskRunner.Process(
{ {"image", packetImg},

{"norm_rect",packetArea},

{"tracking_ids",packetTrackingIds}
});
  1. The FILTERED_LANDMARKS output of the MultiWorldLandmarksSmoothingCalculator node is your smoothed world landmarks.

Source Code: https://github.com/Silverlan/mediapipe_pragma_wrapper/blob/5d75a9cb7b6647522d33a8e2d8a30d82ad2b5dff/mediapipe/examples/desktop/mediapipe_pragma_wrapper/mediapipe_pragma_wrapper.cc#L645

Hope that helps!

Silverlan avatar Jul 11 '23 17:07 Silverlan

Thanks a lot @Silverlan I will try and use your example and see if I can also use it in Python.

igor-basko avatar Jul 13 '23 06:07 igor-basko

@scottxp,

Could you please confirm that this is still an issue or it has been resolved from your end. Thank you!

kuaashish avatar Aug 11 '23 06:08 kuaashish

@igor-basko I would be very interested in your python fix for this issue.

@kuaashish this is still very much an issue for a python implementation.

mcdonasd1212 avatar Aug 18 '23 20:08 mcdonasd1212

@kuaashish This is still an issue for me using the javascript library.

scottxp avatar Aug 23 '23 14:08 scottxp

still an issue for me (android&python)

124bit avatar Aug 31 '23 07:08 124bit

this is still an issue, confirmed on the javascript library

mihaiEDW avatar Aug 31 '23 11:08 mihaiEDW

still an issue Ubuntu Python

mcdonasd1212 avatar Aug 31 '23 15:08 mcdonasd1212

Has anyone made any progress on a python solution that removed the jitter?

mcdonasd1212 avatar Sep 16 '23 02:09 mcdonasd1212

@scottxp,

We are pleased to announce the release of the latest version of MediaPipe, version 0.10.7, which addresses the jittering issue observed in the Pose Landmarker.

This issue has been documented in the release notes under "Fixed Pose Landmarker jittering issue." We kindly request you to build using this updated version and inform us of any persisting issues from your perspective. Thank you

kuaashish avatar Oct 10 '23 06:10 kuaashish

@kuaashish this is not doing anything different in JavaScript with version 0.10.7:

await PoseLandmarker.createFromOptions(vision, {
            baseOptions: {
                modelAssetPath: 'https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_lite/float16/latest/pose_landmarker_lite.task',
                delegate: "GPU"
            },
            runningMode: "VIDEO",
            smoothLandmarks: true,
            numPoses: 1
        });

WiCanIsCool avatar Oct 10 '23 06:10 WiCanIsCool

I can confirmed the jittering still exists, maybe the models on https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_lite/float16/latest/pose_landmarker_*.task haven't been updated yet?

npinochet avatar Oct 10 '23 16:10 npinochet

In python mediapipe 0.10.7 it seems to work when i use runningMode VIDEO and use the detector.detect_for_video function but in JavaScript with runningMode VIDEO and poseLandmarker.detectForVideo is still jittering

WiCanIsCool avatar Oct 11 '23 08:10 WiCanIsCool

checked affirmative, jitter in JavaScript still persist, regardless in runningMode VIDEO or LIVE_STREAM

yiucheung0512 avatar Nov 03 '23 00:11 yiucheung0512

I confirm the problem is still there

Until the problem is fixed I am using smoothing on my side, proposed by chatgpt: https://gist.github.com/mupakoz/c7b3183914b52a08eebbc61599af7e1b

@npinochet @yiucheung0512 @WiCanIsCool @scottxp maybe it helps you

mupakoz avatar Nov 12 '23 07:11 mupakoz

The jittering issue has long been a problem and I hope they can fix this so we can use it in motion capture. I just ran across a possible solution for the javascript version, but I have not tried it yet. https://github.com/yousufkalim/mediapipe-pose-smooth

Just found a video I did of a program I wrote using mediapipe and iclone 2 years ago, same jitter with the hands https://www.youtube.com/watch?v=j6JboJIlpfM

delebash avatar Nov 29 '23 22:11 delebash

It's not the landmarker models - it is the single shot detector of the pipeline. @igor-basko, @Silverlan you can prove this by feeding looped image video frames directly through the face_mesh landmarker models. The BlazeFace detectors return detection box non-reliably - they will return deviating boxes each frame on loopt image. The issue must be fixed on model level. BlazeFace detector is "blazingly" fast, taking 2ms on edge devices, and it returns several landmarks as well. The issue must be fixed at that or landmarker level - no amount of messing with kalman filters will help post-recognition.

bedbad avatar Feb 02 '24 21:02 bedbad

This seems solved with version 0.10.9

hiroMTB avatar Feb 06 '24 09:02 hiroMTB

I have the same problem on android. I am working with NextJS 14 and this is my current version: "@mediapipe/tasks-vision": "^0.10.9", This is how I create the PoseLandmarker:

export const loadPoseLandmarkerModel = async (): Promise<Uint8Array> => {
    const response = await fetch(`/static/models/pose_landmarker_lite.task`);
    if (!response.ok) {
        throw new Error(`Failed to load pose landmarker model file: ${response.statusText}`);
    }
    const buffer = await response.arrayBuffer();
    return new Uint8Array(buffer);
};

export const createPoseLandmarker = async (runningMode: "VIDEO" | "IMAGE"): Promise<PoseLandmarker | null> => {
    const vision = await FilesetResolver.forVisionTasks(
        "https://cdn.jsdelivr.net/npm/@mediapipe/[email protected]/wasm"
    );

    const model = await loadPoseLandmarkerModel();

    return PoseLandmarker.createFromOptions(vision, {
        baseOptions: {
            modelAssetBuffer: model,
            delegate: "GPU",
        },
        runningMode: runningMode,
        numPoses: 1,
    });
};

Here is the result on Android:

https://github.com/google/mediapipe/assets/91951421/552b4f84-0893-4af2-91c0-e5e5d5d60eda

Am I doing something wrong?

cstamati avatar Feb 06 '24 09:02 cstamati

Hello, try 0.10.9 like below

https://cdn.jsdelivr.net/npm/@mediapipe/[email protected]/wasm

hiroMTB avatar Feb 06 '24 11:02 hiroMTB

Also check out MediaPipe version number on your package.json

hiroMTB avatar Feb 06 '24 11:02 hiroMTB

The following code works for me.

import { FilesetResolver, PoseLandmarker } from "@mediapipe/tasks-vision";

export const createPoseLandmarker = async (runningMode: "VIDEO" | "IMAGE"): Promise<PoseLandmarker | null> => {
    const vision = await FilesetResolver.forVisionTasks(
        "https://cdn.jsdelivr.net/npm/@mediapipe/[email protected]/wasm"
    );

    const poseLandmarker = await PoseLandmarker.createFromModelPath(
        vision,
        "https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_lite/float16/1/pose_landmarker_lite.task"
    );

    await poseLandmarker.setOptions({
        runningMode: runningMode,
        numPoses: 1,
    });

    return poseLandmarker;
};
  • I updated the link: https://cdn.jsdelivr.net/npm/@mediapipe/[email protected]/wasm (like @hiroMTB suggested)
  • Removed the delegate: "GPU"

On the last point I did some tests and the result was that when the delegate prop has "GPU" the tracking is not optimal and is jittering a lot and when I remove the prop is working fine.

I also receive this exception on initialization: image

In the end is working fine, a bit slower than on my iOS device but it's getting the job done! Thanks a lot for the help!

cstamati avatar Feb 06 '24 12:02 cstamati

Grad it helped. GPU inference works fine on my side but I'm on macOS chrome. Maybe try clearing cache. I also see a mysterious error around delegate option and it appears and disappears depends on the day.

hiroMTB avatar Feb 06 '24 12:02 hiroMTB