mediapipe
mediapipe copied to clipboard
Pose Landmarker Jittering
Have I written custom code (as opposed to using a stock example script provided in MediaPipe)
No
OS Platform and Distribution
Mac OS X 13.0.1
MediaPipe Tasks SDK version
0.10.0
Task name (e.g. Image classification, Gesture recognition etc.)
Pose Landmark Detection
Programming Language and version (e.g. C++, Python, Java)
Javascript
Describe the actual behavior
I have switched over from the legacy mediapipe library to the new mediapipe solutions. The landmarks are jittering more than expected when I use the VIDEO runningMode on GPU or CPU with any of the pose_landmarker tasks.
Describe the expected behaviour
The legacy mediapipe pose estimation detection offered a smoothing parameter (smoothLandmarks) to reduce the jittering, which worked quite well. I have not been able to find this option in the new mediapipe solutions library.
Standalone code/steps you may have used to try to get what you need
The jittering can be observed on the official mediapipe solutions demo page:
https://mediapipe-studio.webapps.google.com/demo/pose_landmarker
Other info / Complete Logs
Here is my sample code to instantiate the PoseLandmarker:
const vision = await FilesetResolver.forVisionTasks(
// path/to/wasm/root
"https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
);
window.poseLandmarker = await PoseLandmarker.createFromOptions(
vision,
{
baseOptions: {
modelAssetPath: `/models/pose_landmarker_${model_type}.task`,
delegate: "GPU",
},
runningMode: "VIDEO"
}
);
@scottxp,
Could you please elaborate your query with complete details and if you can share any captured jitter image to understand the issue better?
This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.
I am encountering a similar issue in Python. It seems that there is no landmarks smoothing in the mediapipe v0.10.1 In the following two videos we can see the effect. The video that uses mediapipe 0.10.1 the landmarks jitter a lot more than in the second video, which uses mediapipe 0.8.11.
The videos are basically a single image that is fed to the mediapipe solution and task in a loop. The same effect also happens when using webcam.
https://github.com/google/mediapipe/assets/16905449/02381e1a-0514-41a9-81e0-20f6a6eeeced
https://github.com/google/mediapipe/assets/16905449/2310a792-dfd0-450d-a9ec-689ba28f7682
My assumption is that the difference lies in the graph that is used in pose estimation. In version 0.8.11 it uses the following graph that has a smoothing calculator: https://github.com/google/mediapipe/blob/release/mediapipe/modules/pose_landmark/pose_landmark_cpu.pbtxt#L216
While in 0.10.1 it builds a different graph that basically contains PoseLandmarkerGraph calculator and a FlowLimiterCalculator. The PoseLandmarkerGraph calculator consists of two sub graphs, that I assume don't have any smoothing calculator in them. https://github.com/google/mediapipe/blob/91a3c54d558af8c4a0807d2bdd47e875a3c1e87a/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker_graph.cc#L219
Maybe it would be possible to enhance the graph that is created in mediapipe 0.10.1, and add the smoothing calculator. I will try doing that, but I'm not sure if the input and output streams will be compatible.
https://github.com/google/mediapipe/assets/879510/cd51f61e-5f96-48a5-9685-f6e04bdcf435
LEFT: mediapipe/[email protected] RIGHT: mediapipe/[email protected]/pose.js
As described by @igorbasko01, there does not appear to be any landmark smoothing in mediapipe 0.10.1. You can see the jittering landmarks in the video on the left while the video on the right does not jitter. These were captured and processed simultaneously using the same webcam stream but different mediapipe libraries.
Here is the code for the video on the LEFT:
const vision = await FilesetResolver.forVisionTasks(
"https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
);
window.poseLandmarker = await PoseLandmarker.createFromOptions(
vision,
{
baseOptions: {
modelAssetPath: "/models/pose_landmarker_full.task",
delegate: "GPU",
},
runningMode: "VIDEO"
}
);
And here is the code for the video on the RIGHT:
window.poseDetector = await poseDetection.createDetector("BlazePose", {
runtime: "tfjs",
enableSmoothing: true,
modelType: "full",
solutionPath = 'https://cdn.jsdelivr.net/npm/@mediapipe/pose'
});
I have attempted to implement a Python version of the OneEuroFilter, closely modeled after the C++ version found at this mediapipe implementation: https://github.com/google/mediapipe/blob/bed624f3b6f7ad5d25b5474c516561c537f10199/mediapipe/util/filtering/one_euro_filter.cc#L14
I've also replicated the same parameters for this Python OneEuroFilter, including setting the frequency to 30, which corresponds to the number of frames per second (FPS). I used the parameters that can be seen here: https://github.com/google/mediapipe/blob/c8c5f3d062f441eb37738c789a3550e7280ebefe/mediapipe/modules/pose_landmark/pose_landmark_filtering.pbtxt#L115
During the callback, I apply this filter to the Normalized Landmarks of the PoseLandmarkerResult. Notably, I created a distinct filter for each axis of each landmark.
Unfortunately, the jittering issue seems to persist, and I'm unable to observe any significant improvements.
For a closer look, you can find my Python filter implementation and usage in this gist: https://gist.github.com/igorbasko01/c51980df0ce9a516c8bcc4ff8e039eb7
I would greatly appreciate any help in addressing this issue or any advice on potential workarounds.
Pose landmark smoothing is not implemented yet according to the C++ Source Code: https://github.com/google/mediapipe/blob/df3f4167aed857c891395b4bab851a8a4f8024f8/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker_graph.cc#L312
That being said, there is a new MultiWorldLandmarksSmoothingCalculator in the code, it's just not used anywhere yet: https://github.com/google/mediapipe/blob/df3f4167aed857c891395b4bab851a8a4f8024f8/mediapipe/calculators/util/multi_world_landmarks_smoothing_calculator.h#L57
Assuming it's functional, it might be possible to plug in the landmark outputs from the pose landmarker graph into it to get the smoothed landmarks, at least in C++.
// Edit: After some more testing I can confirm that using the calculator for smoothing works with the "one euro" filter!
Hey @Silverlan
Thanks for the suggestion !
Can you please elaborate a bit more on what exactly did you do, and how did you add the MultiWorldLandmarksSmoothingCalculator.
Hey @Silverlan Thanks for the suggestion ! Can you please elaborate a bit more on what exactly did you do, and how did you add the
MultiWorldLandmarksSmoothingCalculator.
I use the C++ API. I can describe my steps, but I don't know the approach for the other APIs.
- Added
//mediapipe/calculators/util:multi_world_landmarks_smoothing_calculatoras dependency to my project so I can use theMultiWorldLandmarksSmoothingCalculatorcalculator. - Added the
MultiWorldLandmarksSmoothingCalculatorcalculator to my graph with theone_eurofilter (velocity filter did not work for me):
{
auto& smoothCalculator = graph.AddNode(
"MultiWorldLandmarksSmoothingCalculator");
auto* options = &smoothCalculator.GetOptions<mediapipe::LandmarksSmoothingCalculatorOptions>();
auto* filter = options->mutable_one_euro_filter();
filter->set_beta(smoothingFilterSettings.beta);
filter->set_disable_value_scaling(smoothingFilterSettings.disableValueScaling);
filter->set_frequency(smoothingFilterSettings.frequency);
filter->set_min_cutoff(smoothingFilterSettings.minCutoff);
filter->set_derivate_cutoff(smoothingFilterSettings.derivateCutoff);
filter->set_min_allowed_object_scale(smoothingFilterSettings.minAllowedObjectScale);
worldLandmarks >>
smoothCalculator.In("LANDMARKS");
trackingIdsInput >>
smoothCalculator.In("TRACKING_IDS");
smoothCalculator.Out("FILTERED_LANDMARKS").SetName(outputName) >>
graph[::mediapipe::api2::Output< std::vector<mediapipe::LandmarkList>>(graphOutputName)];
}
Make sure to add this node after the PoseLandmarkerGraph (or HandLandmarkerGraph) node. Then use the WORLD_LANDMARKS output of the PoseLandmarkerGraph for the LANDMARKS input of the MultiWorldLandmarksSmoothingCalculator node.
- For the
TRACKING_IDSinput you have to create astd::vector<int64_t>with the exact same size as the number of poses. I just have one pose, so I just initialized it withstd::vector<int64_t> trackingIds {0}, then you can use that as input:
std::vector<int64_t> trackingIds = { 0 };
auto packetTrackingIds = mediapipe::MakePacket<std::vector<int64_t>>(trackingIds);
- The
one_eurofilter properties are critical, with the default settings I didn't notice any reduction in jitter at all. The values below worked for me:
smoothingFilterSettings.beta = 10.0
smoothingFilterSettings.minCutoff = 0.05
smoothingFilterSettings.derivateCutoff = 1
smoothingFilterSettings.disableValueScaling = false
smoothingFilterSettings.frequency = 30.0
smoothingFilterSettings.minAllowedObjectScale = 1e-06
You'll probably have to tweak them and play around with them though.
- It won't work without this step: You have to set a timestamp for all input packets:
auto msTime = cap.get(cv::CAP_PROP_POS_MSEC); // Time in miliseconds
auto mcTime = msTime *1000.f; // Time in microseconds
auto packetImg = mediapipe::MakePacket<mediapipe::Image>(*image);
packetImg = packetImg.At(mediapipe::Timestamp(mcTime));
auto packetArea = mediapipe::MakePacket<mediapipe::NormalizedRect>(MakeNormRect(0.5, 0.5, 1.0, 1.0, 0));
packetArea = packetArea.At(mediapipe::Timestamp(mcTime));
std::vector<int64_t> trackingIds = { 0 };
auto packetTrackingIds = mediapipe::MakePacket<std::vector<int64_t>>(trackingIds);
packetTrackingIds = packetTrackingIds.At(mediapipe::Timestamp(mcTime));
auto outputPackets = taskRunner.Process(
{ {"image", packetImg},
{"norm_rect",packetArea},
{"tracking_ids",packetTrackingIds}
});
- The
FILTERED_LANDMARKSoutput of theMultiWorldLandmarksSmoothingCalculatornode is your smoothed world landmarks.
Source Code: https://github.com/Silverlan/mediapipe_pragma_wrapper/blob/5d75a9cb7b6647522d33a8e2d8a30d82ad2b5dff/mediapipe/examples/desktop/mediapipe_pragma_wrapper/mediapipe_pragma_wrapper.cc#L645
Hope that helps!
Thanks a lot @Silverlan I will try and use your example and see if I can also use it in Python.
@scottxp,
Could you please confirm that this is still an issue or it has been resolved from your end. Thank you!
@igor-basko I would be very interested in your python fix for this issue.
@kuaashish this is still very much an issue for a python implementation.
@kuaashish This is still an issue for me using the javascript library.
still an issue for me (android&python)
this is still an issue, confirmed on the javascript library
still an issue Ubuntu Python
Has anyone made any progress on a python solution that removed the jitter?
@scottxp,
We are pleased to announce the release of the latest version of MediaPipe, version 0.10.7, which addresses the jittering issue observed in the Pose Landmarker.
This issue has been documented in the release notes under "Fixed Pose Landmarker jittering issue." We kindly request you to build using this updated version and inform us of any persisting issues from your perspective. Thank you
@kuaashish this is not doing anything different in JavaScript with version 0.10.7:
await PoseLandmarker.createFromOptions(vision, {
baseOptions: {
modelAssetPath: 'https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_lite/float16/latest/pose_landmarker_lite.task',
delegate: "GPU"
},
runningMode: "VIDEO",
smoothLandmarks: true,
numPoses: 1
});
I can confirmed the jittering still exists, maybe the models on https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_lite/float16/latest/pose_landmarker_*.task haven't been updated yet?
In python mediapipe 0.10.7 it seems to work when i use runningMode VIDEO and use the detector.detect_for_video function but in JavaScript with runningMode VIDEO and poseLandmarker.detectForVideo is still jittering
checked affirmative, jitter in JavaScript still persist, regardless in runningMode VIDEO or LIVE_STREAM
I confirm the problem is still there
Until the problem is fixed I am using smoothing on my side, proposed by chatgpt: https://gist.github.com/mupakoz/c7b3183914b52a08eebbc61599af7e1b
@npinochet @yiucheung0512 @WiCanIsCool @scottxp maybe it helps you
The jittering issue has long been a problem and I hope they can fix this so we can use it in motion capture. I just ran across a possible solution for the javascript version, but I have not tried it yet. https://github.com/yousufkalim/mediapipe-pose-smooth
Just found a video I did of a program I wrote using mediapipe and iclone 2 years ago, same jitter with the hands https://www.youtube.com/watch?v=j6JboJIlpfM
It's not the landmarker models - it is the single shot detector of the pipeline. @igor-basko, @Silverlan you can prove this by feeding looped image video frames directly through the face_mesh landmarker models. The BlazeFace detectors return detection box non-reliably - they will return deviating boxes each frame on loopt image. The issue must be fixed on model level. BlazeFace detector is "blazingly" fast, taking 2ms on edge devices, and it returns several landmarks as well. The issue must be fixed at that or landmarker level - no amount of messing with kalman filters will help post-recognition.
This seems solved with version 0.10.9
I have the same problem on android. I am working with NextJS 14 and this is my current version: "@mediapipe/tasks-vision": "^0.10.9", This is how I create the PoseLandmarker:
export const loadPoseLandmarkerModel = async (): Promise<Uint8Array> => {
const response = await fetch(`/static/models/pose_landmarker_lite.task`);
if (!response.ok) {
throw new Error(`Failed to load pose landmarker model file: ${response.statusText}`);
}
const buffer = await response.arrayBuffer();
return new Uint8Array(buffer);
};
export const createPoseLandmarker = async (runningMode: "VIDEO" | "IMAGE"): Promise<PoseLandmarker | null> => {
const vision = await FilesetResolver.forVisionTasks(
"https://cdn.jsdelivr.net/npm/@mediapipe/[email protected]/wasm"
);
const model = await loadPoseLandmarkerModel();
return PoseLandmarker.createFromOptions(vision, {
baseOptions: {
modelAssetBuffer: model,
delegate: "GPU",
},
runningMode: runningMode,
numPoses: 1,
});
};
Here is the result on Android:
https://github.com/google/mediapipe/assets/91951421/552b4f84-0893-4af2-91c0-e5e5d5d60eda
Am I doing something wrong?
Also check out MediaPipe version number on your package.json
The following code works for me.
import { FilesetResolver, PoseLandmarker } from "@mediapipe/tasks-vision";
export const createPoseLandmarker = async (runningMode: "VIDEO" | "IMAGE"): Promise<PoseLandmarker | null> => {
const vision = await FilesetResolver.forVisionTasks(
"https://cdn.jsdelivr.net/npm/@mediapipe/[email protected]/wasm"
);
const poseLandmarker = await PoseLandmarker.createFromModelPath(
vision,
"https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_lite/float16/1/pose_landmarker_lite.task"
);
await poseLandmarker.setOptions({
runningMode: runningMode,
numPoses: 1,
});
return poseLandmarker;
};
- I updated the link: https://cdn.jsdelivr.net/npm/@mediapipe/[email protected]/wasm (like @hiroMTB suggested)
- Removed the
delegate: "GPU"
On the last point I did some tests and the result was that when the delegate prop has "GPU" the tracking is not optimal and is jittering a lot and when I remove the prop is working fine.
I also receive this exception on initialization:
In the end is working fine, a bit slower than on my iOS device but it's getting the job done! Thanks a lot for the help!
Grad it helped. GPU inference works fine on my side but I'm on macOS chrome. Maybe try clearing cache. I also see a mysterious error around delegate option and it appears and disappears depends on the day.