mediapipe Explanation of output shape [1, 117] (World landmarks for pose) or [1, 195] (Pose landmarks) of pose_landmarks

Explanation of output shape [1, 117] (World landmarks for pose) or [1, 195] (Pose landmarks) of pose_landmarks_detector.tflite in Mediapipe

Open mbkamran opened this issue 5 months ago • 0 comments

I downloaded the pose_landmaker_lite.task file from the official Mediapipe guide for Pose Landmark Detection here:

In order to access its .tflite models, I unzipped it using unzip pose_landmaker_lite.task and got 2 files: pose_detector.tflite and pose_landmarks_detector.tflite.

Question 1: How do we interpret these models and how are they being used for tasks?

pose_landmarks_detector.tflite appears to be one for pose detection, as we can visualize the structure and outputs of both the models at Netron App and see that this model has pose detection outputs:

However, I have difficulty understanding the shapes and meaning of both "Pose landmarks" Output Shape: [1,195] and "World landmarks for pose" Output Shape: [1,117]

Question 2: How do we interpret the shapes `[1,195]` and `[1,117]`?

And finally,

Question 3: How do we interpret the structure of the model, especially that how does it relate with BlazePose and MobileNetV2? Also is there any support for fine-tuning, using the trained backbone in this model and writing a custom head?

Sep 12 '24 14:09 mbkamran

mediapipe mediapipe copied to clipboard

Explanation of output shape [1, 117] (World landmarks for pose) or [1, 195] (Pose landmarks) of pose_landmarks_detector.tflite in Mediapipe

Question 1: How do we interpret these models and how are they being used for tasks?

Question 2: How do we interpret the shapes [1,195] and [1,117]?

Question 3: How do we interpret the structure of the model, especially that how does it relate with BlazePose and MobileNetV2? Also is there any support for fine-tuning, using the trained backbone in this model and writing a custom head?

mediapipe
mediapipe copied to clipboard

Question 2: How do we interpret the shapes `[1,195]` and `[1,117]`?