tapnet icon indicating copy to clipboard operation
tapnet copied to clipboard

Use Case: Temporally Coherent Pose Estimation

Open AmitMY opened this issue 1 year ago • 2 comments

Frameworks such as Mediapipe or OpenPose are used to extract skeletal keypoints from images. Unfortunately, the results are inconsistent and somewhat jittery when trying to extract poses from consecutive frames.

I propose a use case supported by tapir:

  1. Extract poses for an initial frame using mediapipe. Perhaps even for the whole video.
  2. Track the keypoints across frames. Prefer tapir's tracking. If tapir and mediapipe diverge, fall back to the mediapipe pose and continue tracking from there.

This idea, similarly to how MP4 files work, considers P-frames as gold, mediapipe poses, and I-frames, as long as consistent, from tapir. When the data stored in the I-frame is no longer consistent, introduce another P-frame. (this can also be done per-frame per-keypoint)

Related issue: https://github.com/qianqianwang68/omnimotion/issues/5

AmitMY avatar Jun 17 '23 16:06 AmitMY

Have you made any progress on this? I'm considering using mediapipe keypoints as well

pranavmalikk avatar Jun 29 '23 03:06 pranavmalikk

Have you made any progress on this? I'm considering using mediapipe keypoints as well

I haven't yet attempted an implementation. I think it would be really cool, once I (or someone) has the time to play with it

AmitMY avatar Jun 30 '23 06:06 AmitMY

Closing due to inactivity. We don't currently have any work in this specific direction.

cdoersch avatar Jul 15 '24 17:07 cdoersch