mediapipe
mediapipe copied to clipboard
Presence/Visibility Scores Remain at 0.99 even for joints not visible
Have I written custom code (as opposed to using a stock example script provided in MediaPipe)
No
OS Platform and Distribution
Linux Ubuntu 20.04
MediaPipe version
0.10.10
Bazel version
No response
Solution
Pose
Programming Language and version
Python = 3.10.0
Describe the actual behavior
The visibility and/or presence for the joints which are not visible in the video is around 0.99
Describe the expected behaviour
The visibility and/or presence for the joints which are not visible in the video should be close to 0. The video is taken from behind of a single person which means that nose and eyes should not be visible at all.
Standalone code/steps you may have used to try to get what you need
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import cv2 as cv
model_path = ('/home/cvpr/CVPR/pose/pose_landmarker_lite.task')
BaseOptions = mp.tasks.BaseOptions
PoseLandmarker = mp.tasks.vision.PoseLandmarker
PoseLandmarkerOptions = mp.tasks.vision.PoseLandmarkerOptions
VisionRunningMode = mp.tasks.vision.RunningMode
# Create a pose landmarker instance with the video mode:
options = PoseLandmarkerOptions(
base_options=BaseOptions(model_asset_path=model_path),
running_mode=VisionRunningMode.VIDEO)
with PoseLandmarker.create_from_options(options) as landmarker:
video_path = '/home/cvpr/CVPR/pose/raw/1.mp4'
cap = cv.VideoCapture(video_path)
frame_count = cap.get(cv.CAP_PROP_FRAME_COUNT)
fps = cap.get(cv.CAP_PROP_FPS)
for i in range(int(frame_count)):
success, frame = cap.read()
if not success:
break
frame_rgb = cv.cvtColor(frame, cv.COLOR_BGR2RGB)
# Convert the frame to MediaPipe’s Image object.
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=frame_rgb)
# Calculate the timestamp for the current frame
timestamp_ms = int((i / fps) * 1000) # Convert frame index to milliseconds
# Process the image using the pose landmarker.
results = landmarker.detect_for_video(mp_image, timestamp_ms)
landmarks = results.pose_landmarks
# Check if landmarks were detected before accessing them
if results.pose_landmarks is not None:
print(i,'.', results.pose_landmarks[0][0])
Other info / Complete Logs
Printing the first 50 frames for the processed video.
1 . NormalizedLandmark(x=0.5442178249359131, y=0.2998196482658386, z=-0.0007978460052981973, visibility=0.9999915957450867, presence=0.9999762773513794)
2 . NormalizedLandmark(x=0.5442141890525818, y=0.300722599029541, z=-0.010013706050813198, visibility=0.999991774559021, presence=0.999982476234436)
3 . NormalizedLandmark(x=0.5441888570785522, y=0.3029216527938843, z=-0.013496209867298603, visibility=0.9999918937683105, presence=0.9999829530715942)
4 . NormalizedLandmark(x=0.5441299676895142, y=0.30513015389442444, z=-0.013967284001410007, visibility=0.9999919533729553, presence=0.9999831914901733)
5 . NormalizedLandmark(x=0.5439504981040955, y=0.3082234561443329, z=-0.01848534680902958, visibility=0.9999922513961792, presence=0.9999873638153076)
6 . NormalizedLandmark(x=0.5434966683387756, y=0.31141090393066406, z=-0.014503104612231255, visibility=0.9999924898147583, presence=0.9999860525131226)
7 . NormalizedLandmark(x=0.5432760119438171, y=0.31242284178733826, z=-0.015437656082212925, visibility=0.9999925494194031, presence=0.9999814033508301)
8 . NormalizedLandmark(x=0.5429794192314148, y=0.3155577480792999, z=-0.018677139654755592, visibility=0.9999927282333374, presence=0.9999845027923584)
9 . NormalizedLandmark(x=0.542636513710022, y=0.31824660301208496, z=-0.017104240134358406, visibility=0.9999929070472717, presence=0.9999860525131226)
10 . NormalizedLandmark(x=0.5424553751945496, y=0.321556031703949, z=-0.016628123819828033, visibility=0.9999930262565613, presence=0.9999852180480957)
11 . NormalizedLandmark(x=0.541935384273529, y=0.3264073431491852, z=-0.01674201898276806, visibility=0.999993085861206, presence=0.9999871253967285)
12 . NormalizedLandmark(x=0.5414992570877075, y=0.3288175165653229, z=-0.016105124726891518, visibility=0.999993085861206, presence=0.9999833106994629)
13 . NormalizedLandmark(x=0.540871262550354, y=0.32988888025283813, z=-0.018359722569584846, visibility=0.9999930262565613, presence=0.999982476234436)
14 . NormalizedLandmark(x=0.5406372547149658, y=0.3321627974510193, z=-0.017809180542826653, visibility=0.9999930262565613, presence=0.9999840259552002)
15 . NormalizedLandmark(x=0.5402767062187195, y=0.33617469668388367, z=-0.012142944149672985, visibility=0.999992847442627, presence=0.9999817609786987)
16 . NormalizedLandmark(x=0.5398808717727661, y=0.3393665850162506, z=-0.01216091588139534, visibility=0.9999927282333374, presence=0.9999812841415405)
17 . NormalizedLandmark(x=0.5396113991737366, y=0.34290677309036255, z=-0.009626520797610283, visibility=0.9999924898147583, presence=0.9999781847000122)
18 . NormalizedLandmark(x=0.5394593477249146, y=0.34623265266418457, z=-0.009870109148323536, visibility=0.9999920725822449, presence=0.999975323677063)
19 . NormalizedLandmark(x=0.5392148494720459, y=0.3459140658378601, z=-0.013107147999107838, visibility=0.999991774559021, presence=0.9999730587005615)
20 . NormalizedLandmark(x=0.5391708016395569, y=0.34913840889930725, z=-0.013900247402489185, visibility=0.9999916553497314, presence=0.9999794960021973)
21 . NormalizedLandmark(x=0.538922905921936, y=0.3524303436279297, z=-0.01585184782743454, visibility=0.9999917149543762, presence=0.9999816417694092)
22 . NormalizedLandmark(x=0.5387272238731384, y=0.35561293363571167, z=-0.027290476486086845, visibility=0.9999918937683105, presence=0.9999841451644897)
23 . NormalizedLandmark(x=0.5380319356918335, y=0.3586253821849823, z=-0.031669482588768005, visibility=0.9999920725822449, presence=0.9999829530715942)
24 . NormalizedLandmark(x=0.5372130870819092, y=0.36033308506011963, z=-0.037193603813648224, visibility=0.9999924302101135, presence=0.9999877214431763)
25 . NormalizedLandmark(x=0.5368179082870483, y=0.36050349473953247, z=-0.03686932474374771, visibility=0.9999926090240479, presence=0.9999849796295166)
26 . NormalizedLandmark(x=0.5364562273025513, y=0.36210164427757263, z=-0.03723995387554169, visibility=0.9999927878379822, presence=0.9999860525131226)
27 . NormalizedLandmark(x=0.5362903475761414, y=0.364221453666687, z=-0.03967200592160225, visibility=0.9999930262565613, presence=0.9999871253967285)
28 . NormalizedLandmark(x=0.5359679460525513, y=0.3655650317668915, z=-0.04274267703294754, visibility=0.9999933242797852, presence=0.9999887943267822)
29 . NormalizedLandmark(x=0.5354193449020386, y=0.3670402765274048, z=-0.054456669837236404, visibility=0.9999936819076538, presence=0.9999910593032837)
30 . NormalizedLandmark(x=0.5347834825515747, y=0.36873146891593933, z=-0.05036207661032677, visibility=0.9999939203262329, presence=0.9999876022338867)
31 . NormalizedLandmark(x=0.5342275500297546, y=0.36923372745513916, z=-0.05105500668287277, visibility=0.999994158744812, presence=0.9999887943267822)
32 . NormalizedLandmark(x=0.5335609316825867, y=0.37038344144821167, z=-0.050282880663871765, visibility=0.9999942779541016, presence=0.9999868869781494)
33 . NormalizedLandmark(x=0.5335651636123657, y=0.3710387051105499, z=-0.050193388015031815, visibility=0.9999945163726807, presence=0.9999890327453613)
34 . NormalizedLandmark(x=0.5336644649505615, y=0.37067440152168274, z=-0.05295398086309433, visibility=0.9999946355819702, presence=0.9999873638153076)
35 . NormalizedLandmark(x=0.5342223048210144, y=0.3706806004047394, z=-0.05733434855937958, visibility=0.9999947547912598, presence=0.99998939037323)
36 . NormalizedLandmark(x=0.5361438393592834, y=0.3705573081970215, z=-0.057974644005298615, visibility=0.9999948143959045, presence=0.9999879598617554)
37 . NormalizedLandmark(x=0.5374116897583008, y=0.3703038692474365, z=-0.06226826086640358, visibility=0.9999948740005493, presence=0.9999885559082031)
38 . NormalizedLandmark(x=0.5386218428611755, y=0.37009063363075256, z=-0.06576354801654816, visibility=0.9999949932098389, presence=0.9999901056289673)
39 . NormalizedLandmark(x=0.5403669476509094, y=0.36899101734161377, z=-0.06386437267065048, visibility=0.9999949932098389, presence=0.9999867677688599)
40 . NormalizedLandmark(x=0.5412922501564026, y=0.36888203024864197, z=-0.06912153214216232, visibility=0.9999951124191284, presence=0.9999896287918091)
41 . NormalizedLandmark(x=0.5411357879638672, y=0.3688073456287384, z=-0.07635979354381561, visibility=0.9999951124191284, presence=0.9999872446060181)
42 . NormalizedLandmark(x=0.5394794940948486, y=0.3687383234500885, z=-0.07891643792390823, visibility=0.9999950528144836, presence=0.9999854564666748)
43 . NormalizedLandmark(x=0.5377272963523865, y=0.3692466616630554, z=-0.08878250420093536, visibility=0.9999951720237732, presence=0.9999871253967285)
44 . NormalizedLandmark(x=0.5342525243759155, y=0.3692271113395691, z=-0.10355834662914276, visibility=0.999995231628418, presence=0.999983549118042)
45 . NormalizedLandmark(x=0.5316548943519592, y=0.36998069286346436, z=-0.09924280643463135, visibility=0.9999950528144836, presence=0.9999736547470093)
46 . NormalizedLandmark(x=0.5306356549263, y=0.37102752923965454, z=-0.09367218613624573, visibility=0.9999947547912598, presence=0.9999679327011108)
47 . NormalizedLandmark(x=0.5283324718475342, y=0.37098363041877747, z=-0.09640727937221527, visibility=0.9999945759773254, presence=0.9999693632125854)
48 . NormalizedLandmark(x=0.5262799263000488, y=0.37057167291641235, z=-0.09843923151493073, visibility=0.9999943971633911, presence=0.999967098236084)
49 . NormalizedLandmark(x=0.5266479849815369, y=0.36890748143196106, z=-0.09951785206794739, visibility=0.9999940395355225, presence=0.9999550580978394)
50 . NormalizedLandmark(x=0.5263305306434631, y=0.36809614300727844, z=-0.10194762796163559, visibility=0.9999938607215881, presence=0.9999643564224243)
51 . NormalizedLandmark(x=0.5267273187637329, y=0.3673267662525177, z=-0.10295315831899643, visibility=0.9999935626983643, presence=0.9999595880508423)
I'm encountering unexpected results with MediaPipe's Pose Landmark detection.
Video: Link to YouTube video: https://www.youtube.com/watch?v=EFY460oquXw
Video Details: Frame rate (FPS): 30 Resolution: 1080x1920 (Height x Width) Single person in view from back angle(no overlapping people)
Expected Behavior: Since the person is facing away, landmarks like the nose and eyes should have low visibility scores.
Actual Behavior: All three Pose Landmark models I downloaded (lite, full, and heavy) consistently report high visibility and presence scores for the nose, eyes, and other front-facing joints.
Models Used: Downloaded the latest versions of all the models (lite,full and heavy) from the official MediaPipe website: https://developers.google.com/mediapipe/solutions/vision/pose_landmarker
I'm hoping to understand why are all three models are assigning high scores to landmarks that should be obscured and potentially find ways to reduce the visibility scores for not visible joints to segregate visible and not-visible joints
Let me know if there is any other information on system or mediapipe I can provide to help to debug the issue, thank you.
@Nit-Rathore Did you get any solution for this?
@ErikValle2 Unfortunately not, shifted to another pose detection model for visibility.
Hi @Nit-Rathore,
I apologize for the delayed response. Could you please confirm if this issue has been resolved on your end, or if you still require assistance from us?
Thank you!!
Hey, still require assistance as the problem wasn't solved.
Hi @Nit-Rathore,
Unfortunately, it appears that this is a bug in our pose detection model. We need to fix this. For now, we are marking it as a bug and sharing it with our team, but we cannot provide a timeline for the fix.
Thank you!!
hi @kuaashish, are there any updates on this issue?
Did anyone find a work around for this?