mediapipe
mediapipe copied to clipboard
[HOLISTIC SOLUTION] Info about the visibility/confidence of keypoints from the hands is not available.
Have I written custom code (as opposed to using a stock example script provided in MediaPipe)
No
OS Platform and Distribution
Ubuntu
MediaPipe Tasks SDK version
Holistic
Task name (e.g. Image classification, Gesture recognition etc.)
Holistic
Programming Language and version (e.g. C++, Python, Java)
Python
Describe the actual behavior
In the actual version: Info about the visibility/confidence of keypoints from the hands is not available.
Describe the expected behaviour
Give information about the confidence of the keypoints of the hands extracted
Standalone code/steps you may have used to try to get what you need
In the current holistic solution, the visibility and presence fields for the hands are always 0.
And contrary to the hand solution which has a field called Handedness where the confidence or score for the hand is indicated, in the case of the holistic solution there is no output indicating the quality or confidence of the keypoints obtained.
Is this a bug? Or am I missing something?
Thank you very much.
Other info / Complete Logs
HolisticLandmarkerResult(face_landmarks=[
NormalizedLandmark(x=0.4745168089866638, y=0.36261075735092163, z=-0.0224269051104784, visibility=0.0, presence=0.0),
....
NormalizedLandmark(x=0.5119104385375977, y=0.2810891270637512, z=0.005499151535332203, visibility=0.0, presence=0.0)],
pose_landmarks=[
NormalizedLandmark(x=0.47517403960227966, y=0.3143022358417511, z=-0.9151485562324524, visibility=0.9999208450317383, presence=0.9995543360710144),
....
NormalizedLandmark(x=0.41832780838012695, y=1.8102238178253174, z=0.12485508620738983, visibility=0.005907772108912468, presence=0.001108874916099012)],
pose_world_landmarks=[
Landmark(x=-0.046613965183496475, y=-0.5604096055030823, z=-0.3200050890445709, visibility=0.9999208450317383, presence=0.9995543360710144),
...
Landmark(x=-0.12133946269750595, y=0.5424543023109436, z=0.04660561680793762, visibility=0.005907772108912468, presence=0.001108874916099012)],
left_hand_landmarks=[
NormalizedLandmark(x=0.5576450228691101, y=0.7599831819534302, z=4.721105995031394e-07, visibility=0.0, presence=0.0),
....
NormalizedLandmark(x=0.6063085794448853, y=0.5707101821899414, z=-0.08902209997177124, visibility=0.0, presence=0.0)],
left_hand_world_landmarks=[
Landmark(x=0.019411759451031685, y=-0.2692203223705292, z=-0.36530426144599915, visibility=0.0, presence=0.0),
.....
Landmark(x=0.017838725820183754, y=-0.3276180922985077, z=-0.42397379875183105, visibility=0.0, presence=0.0)],
right_hand_landmarks=[
NormalizedLandmark(x=0.3994499146938324, y=0.7287973761558533, z=3.09612943283355e-07, visibility=0.0, presence=0.0),
.....
NormalizedLandmark(x=0.3777098059654236, y=0.6123549938201904, z=-0.02483273483812809, visibility=0.0, presence=0.0)],
right_hand_world_landmarks=[
Landmark(x=-0.1434965282678604, y=-0.22600455582141876, z=-0.3554910123348236, visibility=0.0, presence=0.0),
.....
Landmark(x=-0.14976395666599274, y=-0.30362075567245483, z=-0.39388307929039, visibility=0.0, presence=0.0)],
face_blendshapes=None, segmentation_mask=None)
Hi @mvazquezgts,
Could you please provide additional information about the problem. Include the following details:
- Outline the steps you are following to implement based on the documentation.
- Specify the Ubuntu version you are using.
- Provide the version of MediaPipe is being used Along with Python Version.
Providing this information will help us better understand and address the issue.
Thank you!!
OS: Ubuntu Programming Language: Python Version de Mediapipe: 0.10.11 Solution: Holistic
Given an input image/frame the output of the model is:
HolisticLandmarkerResult(face_landmarks=[ NormalizedLandmark(x=0.4745168089866638, y=0.36261075735092163, z=-0.0224269051104784, visibility=0.0, presence=0.0), .... NormalizedLandmark(x=0.5119104385375977, y=0.2810891270637512, z=0.005499151535332203, visibility=0.0, presence=0.0)],
pose_landmarks=[ NormalizedLandmark(x=0.47517403960227966, y=0.3143022358417511, z=-0.9151485562324524, visibility=0.9999208450317383, presence=0.9995543360710144), .... NormalizedLandmark(x=0.41832780838012695, y=1.8102238178253174, z=0.12485508620738983, visibility=0.005907772108912468, presence=0.001108874916099012)],
pose_world_landmarks=[
Landmark(x=-0.046613965183496475, y=-0.5604096055030823, z=-0.3200050890445709, visibility=0.9999208450317383, presence=0.9995543360710144), ... Landmark(x=-0.12133946269750595, y=0.5424543023109436, z=0.04660561680793762, visibility=0.005907772108912468, presence=0.001108874916099012)],
left_hand_landmarks=[ NormalizedLandmark(x=0.5576450228691101, y=0.7599831819534302, z=4.721105995031394e-07, visibility=0.0, presence=0.0), .... NormalizedLandmark(x=0.6063085794448853, y=0.5707101821899414, z=-0.08902209997177124, visibility=0.0, presence=0.0)],
left_hand_world_landmarks=[ Landmark(x=0.019411759451031685, y=-0.2692203223705292, z=-0.36530426144599915, visibility=0.0, presence=0.0), ..... Landmark(x=0.017838725820183754, y=-0.3276180922985077, z=-0.42397379875183105, visibility=0.0, presence=0.0)],
right_hand_landmarks=[ NormalizedLandmark(x=0.3994499146938324, y=0.7287973761558533, z=3.09612943283355e-07, visibility=0.0, presence=0.0), ..... NormalizedLandmark(x=0.3777098059654236, y=0.6123549938201904, z=-0.02483273483812809, visibility=0.0, presence=0.0)],
right_hand_world_landmarks=[ Landmark(x=-0.1434965282678604, y=-0.22600455582141876, z=-0.3554910123348236, visibility=0.0, presence=0.0), ..... Landmark(x=-0.14976395666599274, y=-0.30362075567245483, z=-0.39388307929039, visibility=0.0, presence=0.0)],
face_blendshapes=None, segmentation_mask=None)
The available data/fields are:
- face_landmarks [ x, y , z, visibility, presence]
- pose_landmarks & pose_world_landmarks [ x, y , z, visibility, presence]
- left_hand_landmarks & left_hand_world_landmarks [ x, y , z, visibility, presence]
- right_hand_landmarks & right_hand_world_landmarks [ x, y , z, visibility, presence]
- face_blendshapes
- segmentation_mask
Both hand and face visibility and presence always gives 0, regardless of the configuration with which you set up/initialise the model. It only gives some information in pose_landmarks & pose_world_landmarks.
So, my question is whether this is a bug or if it is possible to get the confidence/visibility of the hand points in another way. In the hands model (documentation: https://developers.google.com/mediapipe/solutions/vision/hand_landmarker/python) I see that there is a 'Handedness' field that contains this information.
HandLandmarkerResult: Handedness: Categories #0: index : 0 score : 0.98396 categoryName : Left Landmarks: Landmark #0: x : 0.638852 y : 0.671197 z : -3.41E-7 Landmark #1: x : 0.634599 y : 0.536441 z : -0.06984 ... (21 landmarks for a hand) WorldLandmarks: Landmark #0: x : 0.067485 y : 0.031084 z : 0.055223 Landmark #1: x : 0.063209 y : -0.00382 z : 0.020920
But it seems that in this version of Holistic there is no way to get a score for hand points.
Thank you for raising this. As this is our newest Task, we likely need to invest a bit more time here.
Its an old issue:
#3505
I am experiencing the same issue (holistic task on web), any updates on this?
I am experiencing the same issue (holistic task on web), any updates on this?
wow this is really unprofessional from you guys... there are multiple issues about this. Can someone make a proper explanation about the state of visibility/confidence of pose, face and hand landmark detection @schmidt-sebastian @kuaashish
can you at least please confirm that every landmark will be predicted even if it is not present in the image. If so, can it ever be out of bounds of the base image? @schmidt-sebastian @kuaashish https://github.com/google-ai-edge/mediapipe/issues/3159