mediapipe icon indicating copy to clipboard operation
mediapipe copied to clipboard

Explanation of output shape [1, 117] (World landmarks for pose) or [1, 195] (Pose landmarks) of pose_landmarks_detector.tflite in Mediapipe

Open mbkamran opened this issue 5 months ago • 0 comments

I downloaded the pose_landmaker_lite.task file from the official Mediapipe guide for Pose Landmark Detection here:

image

In order to access its .tflite models, I unzipped it using unzip pose_landmaker_lite.task and got 2 files: pose_detector.tflite and pose_landmarks_detector.tflite.

Question 1: How do we interpret these models and how are they being used for tasks?

pose_landmarks_detector.tflite appears to be one for pose detection, as we can visualize the structure and outputs of both the models at Netron App and see that this model has pose detection outputs:

image

However, I have difficulty understanding the shapes and meaning of both "Pose landmarks" Output Shape: [1,195] and "World landmarks for pose" Output Shape: [1,117]

Question 2: How do we interpret the shapes [1,195] and [1,117]?

And finally,

Question 3: How do we interpret the structure of the model, especially that how does it relate with BlazePose and MobileNetV2? Also is there any support for fine-tuning, using the trained backbone in this model and writing a custom head?

mbkamran avatar Sep 12 '24 14:09 mbkamran