HumanPoseMemo icon indicating copy to clipboard operation
HumanPoseMemo copied to clipboard

Memo about 3d human pose estimation, record of datasets, papers, codes.


Memo about 3D human pose estimation, record of datasets, papers, codes.


2D datasets

related works

For demo

3D datasets

related works

SMPL datasets

related works

Dressed datasets

related works

Face, Hands and Feet

related works


note: I don't include some paper without codes.

  • Before 2020
  • CVPR 2020
  • ECCV 2020

Before 2020

Monocular human pose estimation



Multi-view human pose estimation



Detailed human shape reconstruction



Multi-View Stereo






keywords: human, motion, tracking, person, pose

2D human pose

monocular 3D pose

multi view

  • ActiveMoCap: Optimized Viewpoint Selection for Active Human Motion Capture
  • Multi-View Neural Human Rendering
  • Fusing Wearable IMUs With Multi-View Images for Human Pose Estimation: A Geometric Approach
  • Cross-View Tracking for Multi-Human 3D Pose Estimation at Over 100 FPS
  • 4D Association Graph for Realtime Multi-Person Motion Capture Using Multiple Video Cameras
  • Deep 3D Capture: Geometry and Reflectance From Sparse Multi-View Images
  • Lightweight Multi-View 3D Pose Estimation Through Camera-Disentangled Representation

depth, detailed, cloth

  • PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization:[code]
  • Self-Supervised Human Depth Estimation From Monocular Videos
  • ARCH: Animatable Reconstruction of Clothed Humans
  • DeepCap: Monocular Human Performance Capture Using Weak Supervision
  • TetraTSDF: 3D Human Reconstruction From a Single Image With a Tetrahedral Outer Shell
  • Learning to Transfer Texture From Clothing Images to 3D Humans
  • TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style[oral]
  • Novel View Synthesis of Dynamic Scenes With Globally Coherent Depths From a Monocular Camera
  • 4D Visualization of Dynamic Events From Unconstrained Multi-View Videos
  • Multi-View Neural Human Rendering

HH, HO interactions

  • Discovering Human Interactions With Novel Objects via Zero-Shot Learning
  • Mixture Dense Regression for Object Detection and Human Pose Estimation
  • VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions
  • PPDM: Parallel Point Detection and Matching for Real-Time Human-Object Interaction Detection
  • Learning Human-Object Interaction Detection Using Interaction Points
  • Cascaded Human-Object Interaction Recognition
  • GanHand: Predicting Human Grasp Affordances in Multi-Object Scenes
  • Detailed 2D-3D Joint Representation for Human-Object Interaction

action, tracking, trajectory, prediction

  • Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based Human Motion Prediction
  • Active Vision for Early Recognition of Human Actions
  • Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition
  • [Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction][code]
  • Reciprocal Learning Networks for Human Trajectory Prediction
  • PaStaNet: Toward Human Activity Knowledge Engine
  • A Stochastic Conditioning Scheme for Diverse Human Motion Prediction
  • Bayesian Adversarial Human Motion Synthesis[oral]
  • Learning Dynamic Relationships for 3D Human Motion Prediction
  • Context-Aware Human Motion Prediction
  • Learning a Neural Solver for Multiple Object Tracking[oral]
  • Skeleton-Based Action Recognition With Shift Graph Convolutional Network
  • Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition

face, hand

  • Understanding Human Hands in Contact at Internet Scale
  • AvatarMe: Realistically Renderable 3D Facial Reconstruction “In-the-Wild”
  • Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild
  • Deep Facial Non-Rigid Multi-View Stereo
  • Can Facial Pose and Expression Be Separated With Weak Perspective Camera?


  • HUMBI: A Large Multiview Dataset of Human Body Expressions
  • PANDA: A Gigapixel-Level Human-Centric Video Dataset
  • HOnnotate: A Method for 3D Annotation of Hand and Object Poses
some interesting works

  • End-to-End Camera Calibration for Broadcast Videos
  • Transferring Dense Pose to Proximal Animal Classes
  • Dynamic Graph Message Passing Networks
  • Self-Learning Video Rain Streak Removal: When Cyclic Consistency Meets Temporal Correspondence
  • Learning to Optimize Non-Rigid Tracking
  • SuperGlue: Learning Feature Matching With Graph Neural Networks
  • Spatial-Temporal Graph Convolutional Network for Video-Based Person Re-Identification
  • Minimal Solutions to Relative Pose Estimation From Two Views Sharing a Common Direction With Unknown Focal Length
  • NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis: [code],[code-PyTorch]
  • Learning Character-Agnostic Motion for Motion Retargeting in 2D Decompose and recompose the video, could be used for motion retrival.


2D human pose

3D human pose

multi-person 3d



  • Decoupling GCN with DropGraph Module for Skeleton-Based Action Recognition
  • Hidden Footprints: Learning ContextualWalkability from 3D Human Trails
  • MotionSqueeze: Neural Motion FeatureLearning for Video Understanding
  • Structure-Aware Human-Action Generation

face, hand, detailed human



submit to CVPR21


2D Pose

3D Pose









You can contribute to this repor by fork and pull.

You can also see Awesome Human Pose Estimation, awesome-3d-human