mediapipe icon indicating copy to clipboard operation
mediapipe copied to clipboard

Presence/Visibility Scores Remain at 0.99 even for joints not visible

Open Nit-Rathore opened this issue 1 year ago • 8 comments

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

No

OS Platform and Distribution

Linux Ubuntu 20.04

MediaPipe version

0.10.10

Bazel version

No response

Solution

Pose

Programming Language and version

Python = 3.10.0

Describe the actual behavior

The visibility and/or presence for the joints which are not visible in the video is around 0.99

Describe the expected behaviour

The visibility and/or presence for the joints which are not visible in the video should be close to 0. The video is taken from behind of a single person which means that nose and eyes should not be visible at all.

Standalone code/steps you may have used to try to get what you need

import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import cv2 as cv

model_path = ('/home/cvpr/CVPR/pose/pose_landmarker_lite.task')

BaseOptions = mp.tasks.BaseOptions
PoseLandmarker = mp.tasks.vision.PoseLandmarker
PoseLandmarkerOptions = mp.tasks.vision.PoseLandmarkerOptions
VisionRunningMode = mp.tasks.vision.RunningMode

# Create a pose landmarker instance with the video mode:
options = PoseLandmarkerOptions(
    base_options=BaseOptions(model_asset_path=model_path),
    running_mode=VisionRunningMode.VIDEO)

with PoseLandmarker.create_from_options(options) as landmarker:
    video_path = '/home/cvpr/CVPR/pose/raw/1.mp4'
    cap = cv.VideoCapture(video_path)
    frame_count = cap.get(cv.CAP_PROP_FRAME_COUNT)
    fps = cap.get(cv.CAP_PROP_FPS)

    for i in range(int(frame_count)):
        success, frame = cap.read()
        if not success:
            break

        frame_rgb = cv.cvtColor(frame, cv.COLOR_BGR2RGB)

        # Convert the frame to MediaPipe’s Image object.
        mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=frame_rgb)

        # Calculate the timestamp for the current frame
        timestamp_ms = int((i / fps) * 1000)  # Convert frame index to milliseconds

        # Process the image using the pose landmarker.
        results = landmarker.detect_for_video(mp_image, timestamp_ms)
        landmarks = results.pose_landmarks

        # Check if landmarks were detected before accessing them
        if results.pose_landmarks is not None:
            print(i,'.', results.pose_landmarks[0][0])

Other info / Complete Logs

Printing the first 50 frames for the processed video.


1 . NormalizedLandmark(x=0.5442178249359131, y=0.2998196482658386, z=-0.0007978460052981973, visibility=0.9999915957450867, presence=0.9999762773513794)
2 . NormalizedLandmark(x=0.5442141890525818, y=0.300722599029541, z=-0.010013706050813198, visibility=0.999991774559021, presence=0.999982476234436)
3 . NormalizedLandmark(x=0.5441888570785522, y=0.3029216527938843, z=-0.013496209867298603, visibility=0.9999918937683105, presence=0.9999829530715942)
4 . NormalizedLandmark(x=0.5441299676895142, y=0.30513015389442444, z=-0.013967284001410007, visibility=0.9999919533729553, presence=0.9999831914901733)
5 . NormalizedLandmark(x=0.5439504981040955, y=0.3082234561443329, z=-0.01848534680902958, visibility=0.9999922513961792, presence=0.9999873638153076)
6 . NormalizedLandmark(x=0.5434966683387756, y=0.31141090393066406, z=-0.014503104612231255, visibility=0.9999924898147583, presence=0.9999860525131226)
7 . NormalizedLandmark(x=0.5432760119438171, y=0.31242284178733826, z=-0.015437656082212925, visibility=0.9999925494194031, presence=0.9999814033508301)
8 . NormalizedLandmark(x=0.5429794192314148, y=0.3155577480792999, z=-0.018677139654755592, visibility=0.9999927282333374, presence=0.9999845027923584)
9 . NormalizedLandmark(x=0.542636513710022, y=0.31824660301208496, z=-0.017104240134358406, visibility=0.9999929070472717, presence=0.9999860525131226)
10 . NormalizedLandmark(x=0.5424553751945496, y=0.321556031703949, z=-0.016628123819828033, visibility=0.9999930262565613, presence=0.9999852180480957)
11 . NormalizedLandmark(x=0.541935384273529, y=0.3264073431491852, z=-0.01674201898276806, visibility=0.999993085861206, presence=0.9999871253967285)
12 . NormalizedLandmark(x=0.5414992570877075, y=0.3288175165653229, z=-0.016105124726891518, visibility=0.999993085861206, presence=0.9999833106994629)
13 . NormalizedLandmark(x=0.540871262550354, y=0.32988888025283813, z=-0.018359722569584846, visibility=0.9999930262565613, presence=0.999982476234436)
14 . NormalizedLandmark(x=0.5406372547149658, y=0.3321627974510193, z=-0.017809180542826653, visibility=0.9999930262565613, presence=0.9999840259552002)
15 . NormalizedLandmark(x=0.5402767062187195, y=0.33617469668388367, z=-0.012142944149672985, visibility=0.999992847442627, presence=0.9999817609786987)
16 . NormalizedLandmark(x=0.5398808717727661, y=0.3393665850162506, z=-0.01216091588139534, visibility=0.9999927282333374, presence=0.9999812841415405)
17 . NormalizedLandmark(x=0.5396113991737366, y=0.34290677309036255, z=-0.009626520797610283, visibility=0.9999924898147583, presence=0.9999781847000122)
18 . NormalizedLandmark(x=0.5394593477249146, y=0.34623265266418457, z=-0.009870109148323536, visibility=0.9999920725822449, presence=0.999975323677063)
19 . NormalizedLandmark(x=0.5392148494720459, y=0.3459140658378601, z=-0.013107147999107838, visibility=0.999991774559021, presence=0.9999730587005615)
20 . NormalizedLandmark(x=0.5391708016395569, y=0.34913840889930725, z=-0.013900247402489185, visibility=0.9999916553497314, presence=0.9999794960021973)
21 . NormalizedLandmark(x=0.538922905921936, y=0.3524303436279297, z=-0.01585184782743454, visibility=0.9999917149543762, presence=0.9999816417694092)
22 . NormalizedLandmark(x=0.5387272238731384, y=0.35561293363571167, z=-0.027290476486086845, visibility=0.9999918937683105, presence=0.9999841451644897)
23 . NormalizedLandmark(x=0.5380319356918335, y=0.3586253821849823, z=-0.031669482588768005, visibility=0.9999920725822449, presence=0.9999829530715942)
24 . NormalizedLandmark(x=0.5372130870819092, y=0.36033308506011963, z=-0.037193603813648224, visibility=0.9999924302101135, presence=0.9999877214431763)
25 . NormalizedLandmark(x=0.5368179082870483, y=0.36050349473953247, z=-0.03686932474374771, visibility=0.9999926090240479, presence=0.9999849796295166)
26 . NormalizedLandmark(x=0.5364562273025513, y=0.36210164427757263, z=-0.03723995387554169, visibility=0.9999927878379822, presence=0.9999860525131226)
27 . NormalizedLandmark(x=0.5362903475761414, y=0.364221453666687, z=-0.03967200592160225, visibility=0.9999930262565613, presence=0.9999871253967285)
28 . NormalizedLandmark(x=0.5359679460525513, y=0.3655650317668915, z=-0.04274267703294754, visibility=0.9999933242797852, presence=0.9999887943267822)
29 . NormalizedLandmark(x=0.5354193449020386, y=0.3670402765274048, z=-0.054456669837236404, visibility=0.9999936819076538, presence=0.9999910593032837)
30 . NormalizedLandmark(x=0.5347834825515747, y=0.36873146891593933, z=-0.05036207661032677, visibility=0.9999939203262329, presence=0.9999876022338867)
31 . NormalizedLandmark(x=0.5342275500297546, y=0.36923372745513916, z=-0.05105500668287277, visibility=0.999994158744812, presence=0.9999887943267822)
32 . NormalizedLandmark(x=0.5335609316825867, y=0.37038344144821167, z=-0.050282880663871765, visibility=0.9999942779541016, presence=0.9999868869781494)
33 . NormalizedLandmark(x=0.5335651636123657, y=0.3710387051105499, z=-0.050193388015031815, visibility=0.9999945163726807, presence=0.9999890327453613)
34 . NormalizedLandmark(x=0.5336644649505615, y=0.37067440152168274, z=-0.05295398086309433, visibility=0.9999946355819702, presence=0.9999873638153076)
35 . NormalizedLandmark(x=0.5342223048210144, y=0.3706806004047394, z=-0.05733434855937958, visibility=0.9999947547912598, presence=0.99998939037323)
36 . NormalizedLandmark(x=0.5361438393592834, y=0.3705573081970215, z=-0.057974644005298615, visibility=0.9999948143959045, presence=0.9999879598617554)
37 . NormalizedLandmark(x=0.5374116897583008, y=0.3703038692474365, z=-0.06226826086640358, visibility=0.9999948740005493, presence=0.9999885559082031)
38 . NormalizedLandmark(x=0.5386218428611755, y=0.37009063363075256, z=-0.06576354801654816, visibility=0.9999949932098389, presence=0.9999901056289673)
39 . NormalizedLandmark(x=0.5403669476509094, y=0.36899101734161377, z=-0.06386437267065048, visibility=0.9999949932098389, presence=0.9999867677688599)
40 . NormalizedLandmark(x=0.5412922501564026, y=0.36888203024864197, z=-0.06912153214216232, visibility=0.9999951124191284, presence=0.9999896287918091)
41 . NormalizedLandmark(x=0.5411357879638672, y=0.3688073456287384, z=-0.07635979354381561, visibility=0.9999951124191284, presence=0.9999872446060181)
42 . NormalizedLandmark(x=0.5394794940948486, y=0.3687383234500885, z=-0.07891643792390823, visibility=0.9999950528144836, presence=0.9999854564666748)
43 . NormalizedLandmark(x=0.5377272963523865, y=0.3692466616630554, z=-0.08878250420093536, visibility=0.9999951720237732, presence=0.9999871253967285)
44 . NormalizedLandmark(x=0.5342525243759155, y=0.3692271113395691, z=-0.10355834662914276, visibility=0.999995231628418, presence=0.999983549118042)
45 . NormalizedLandmark(x=0.5316548943519592, y=0.36998069286346436, z=-0.09924280643463135, visibility=0.9999950528144836, presence=0.9999736547470093)
46 . NormalizedLandmark(x=0.5306356549263, y=0.37102752923965454, z=-0.09367218613624573, visibility=0.9999947547912598, presence=0.9999679327011108)
47 . NormalizedLandmark(x=0.5283324718475342, y=0.37098363041877747, z=-0.09640727937221527, visibility=0.9999945759773254, presence=0.9999693632125854)
48 . NormalizedLandmark(x=0.5262799263000488, y=0.37057167291641235, z=-0.09843923151493073, visibility=0.9999943971633911, presence=0.999967098236084)
49 . NormalizedLandmark(x=0.5266479849815369, y=0.36890748143196106, z=-0.09951785206794739, visibility=0.9999940395355225, presence=0.9999550580978394)
50 . NormalizedLandmark(x=0.5263305306434631, y=0.36809614300727844, z=-0.10194762796163559, visibility=0.9999938607215881, presence=0.9999643564224243)
51 . NormalizedLandmark(x=0.5267273187637329, y=0.3673267662525177, z=-0.10295315831899643, visibility=0.9999935626983643, presence=0.9999595880508423)

Nit-Rathore avatar Mar 06 '24 13:03 Nit-Rathore

I'm encountering unexpected results with MediaPipe's Pose Landmark detection.

Video: Link to YouTube video: https://www.youtube.com/watch?v=EFY460oquXw

Video Details: Frame rate (FPS): 30 Resolution: 1080x1920 (Height x Width) Single person in view from back angle(no overlapping people)

Expected Behavior: Since the person is facing away, landmarks like the nose and eyes should have low visibility scores.

Actual Behavior: All three Pose Landmark models I downloaded (lite, full, and heavy) consistently report high visibility and presence scores for the nose, eyes, and other front-facing joints.

Models Used: Downloaded the latest versions of all the models (lite,full and heavy) from the official MediaPipe website: https://developers.google.com/mediapipe/solutions/vision/pose_landmarker

I'm hoping to understand why are all three models are assigning high scores to landmarks that should be obscured and potentially find ways to reduce the visibility scores for not visible joints to segregate visible and not-visible joints

Nit-Rathore avatar Mar 06 '24 13:03 Nit-Rathore

Let me know if there is any other information on system or mediapipe I can provide to help to debug the issue, thank you.

Nit-Rathore avatar Mar 13 '24 14:03 Nit-Rathore

@Nit-Rathore Did you get any solution for this?

ErikValle2 avatar May 31 '24 12:05 ErikValle2

@ErikValle2 Unfortunately not, shifted to another pose detection model for visibility.

Nit-Rathore avatar Jun 02 '24 03:06 Nit-Rathore

Hi @Nit-Rathore,

I apologize for the delayed response. Could you please confirm if this issue has been resolved on your end, or if you still require assistance from us?

Thank you!!

kuaashish avatar Jun 28 '24 07:06 kuaashish

Hey, still require assistance as the problem wasn't solved.

Nit-Rathore avatar Jun 28 '24 11:06 Nit-Rathore

Hi @Nit-Rathore,

Unfortunately, it appears that this is a bug in our pose detection model. We need to fix this. For now, we are marking it as a bug and sharing it with our team, but we cannot provide a timeline for the fix.

Thank you!!

kuaashish avatar Jul 30 '24 08:07 kuaashish

hi @kuaashish, are there any updates on this issue?

Did anyone find a work around for this?

kamilafsar avatar May 20 '25 12:05 kamilafsar