awesome-avatar
awesome-avatar copied to clipboard
📖 A curated list of resources dedicated to avatar.
awesome-avatar
This is a repository for organizing papers, codes and other resources related to the topic of Avatar (talking-face and talking-body).
🔆 This project is still on-going, pull requests are welcomed!!
If you have any suggestions (missing papers, new papers, key researchers or typos), please feel free to edit and pull a request.
TO DO LIST
- [x] Main paper list
- [x] Researchers list
- [x] Toolbox for avatar
- [x] Add paper link
- [ ] Add paper notes
- [x] Add codes if have
- [x] Add project page if have
- [x] Datasets and metrics
- [x] Related links
Researchers and labs
-
NVIDIA Research
- Neural rendering models for human generation: vid2vid NeurIPS'18, fs-vid2vid NeurIPS'19, EG3D CVPR'22;
- Talking-face synthesis: face-vid2vid CVPR'21, Implicit NeurIPS'22, SPACE ICCV'23, One-shot Neural Head Avatar arXiv'23;
- Talking-body synthesis: DreamPose ICCV'23;
- Face enhancement (relighting, restoration, etc): Lumos SIGGRAPH Asia 2022, RANA ICCV'23;
- Authorized use of synthetic videos: Avatar Fingerprinting arXiv'23;
-
Aliaksandr Siarohin @ Snap Research
- Neural rendering models for human generation (focus on flow-based generative models): Unsupervised-Volumetric-Animation CVPR'23, 3DAvatarGAN CVPR'23, 3D-SGAN ECCV'22, Articulated-Animation CVPR'21, Monkey-Net CVPR'19, FOMM NeurIPS'19;
-
Ziwei Liu @ Nanyang Technological University
- Talking-face synthesis: StyleSync CVPR'23, AV-CAT SIGGRAPH Asia 2022, StyleGANX ICCV'23, StyleSwap ECCV'22, PC-AVS CVPR'21, Speech2Talking-Face IJCAI'21, VToonify SIGGRAPH Asia 2022;
- Talking-body synthesis: MotionDiffuse arXiv'22;
- Face enhancement (relighting, restoration, etc): Relighting4D ECCV'22;
-
Xiaodong Cun @ Tencent AI Lab:
- Talking-face synthesis: StyleHEAT ECCV'22, VideoReTalking SIGGRAPH Asia'22, ToolTalking ICCV'23, DPE CVPR'23, CodeTalker CVPR'23, SadTalker CVPR'23;
- Talking-body synthesis: LivelySpeaker ICCV'23;
- Max Planck Institute for Informatics:
- 3D face models (e.g., 3DMM): FLAME SIGGRAPH Asia 2017;
Papers
Example: [Conference'year] Title, First-author Affiliation, ProjectPage, Code
Avatar (face+body)
[arXiv 2024.01] From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations, Meta Reality Labs Research, ProjectPage, Code
2D talking-face synthesis
- [MM 2020] Wav2Lip: Accurately Lip-sync Videos to Any Speech, The International Institute of Islamic Thought (IIIT), India, ProjectPage, Code
⭐
- [MM 2021] Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis, Tsinghua University, Code,
- [CVPR 2021] Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation, The Chinese University of Hong Kong, ProjectPage, Code
- [ICCV 2021] PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering, Peking University, ProjectPage, Code
- [ECCV 2022] StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN, Tsinghua University, ProjectPage, Code
- [SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild, Xidian University, ProjectPage, Code
- [AAAI 2023] DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video, Virtual Human Group, Netease Fuxi AI Lab, Code
⭐
- [CVPR 2023] SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation, Xi'an Jiaotong University, ProjectPage, Code
, Note
- [CVPR 2023] DPE: Disentanglement of Pose and Expression for General Video Portrait Editing, MAIS & NLPR, Institute of Automation, Chinese Academy of Sciences, ProjectPage, Code
- [ICCV 2023] MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions, International Digital Economy Academy (IDEA), China, ProjectPage, Code,
- [ICCV 2023] ToonTalker: Cross-Domain Face Reenactment, Tsinghua University, ProjectPage, Code
- [arXiv 2023] DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models, Tsinghua University, ProjectPage, Code,
- [ICCV 2023] Imitator: Personalized Speech-driven 3D Facial Animation, Max Planck Institute for Intelligent Systems, ProjectPage, Code,
- [ICLR 2024] GAIA: Zero-shot Talking Avatar Generation, Microsoft Research
- [ICLR 2024 (Spotlight)] Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis, Zhejiang University, ProjectPage, Code,
- [Tech Report 2024.03] VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis, Google Research, ProjectPage, audio-driven avatar synthesis (head motion, gaze, blinking, lip movement, upper-body and hand gestures), Highly Recommended!
- [Github repo] MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting, Tencent TMElyralab,
⭐
- [arXiv 2024] VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time, Microsoft Research, ProjectPage
3D talking-face synthesis
- [ICCV 2021] AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis, University of Science and Technology of China, ProjectPage, Code
- [ECCV 2022] Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis, Tsinghua University, ProjectPage, Code
- [ICLR 2023] GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis, Zhejiang University, ProjectPage, Code
- [ICCV 2023] Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis, Beihang University, ProjectPage, Code
⭐
- [arXiv 2023] GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation, Zhejiang University, ProjectPage, Code
- [CVPR 2024] SyncTalk: The Devil is in the Synchronization for Talking Head Synthesi, Renmin University of China, ProjectPage, Code,
⭐
Talking-body synthesis
Co-speech gesture synthesis
- [SIGGRAPH Asia 2020] Gesture Generation from Trimodal Context, Korea Advanced Institute of Science and Technology (KAIST), Code
- [ICCV 2021] Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates, ShanghaiTech University, ProjectPage, Code
- [ICCV 2021] Audio2Motion: Generating Diverse Gestures from Speech with Conditional Variational Autoencoders, Harbin Institute of Technology, Shenzhen, Code
- [CVPR 2022] Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation, The Chinese University of Hong Kong, ProjectPage, Code
- [CVPR 2023] Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation, The University of Hong Kong, Code
- [CVPR 2024] EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling, The University of Tokyo, ProjectPage, Code
Pose2video
- [NeurIPS 2018] Video-to-Video Synthesis, NVIDIA, ProjectPage, Code
- [ICCV 2019] Everybody Dance Now, UC Berkeley, ProjectPage, Code
- [arXiv 2023] Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation, Alibaba Group, ProjectPage, Code,
- [Unofficial reprod] MooreThreads/Moore-AnimateAnyone,
- [Unofficial reprod] MooreThreads/Moore-AnimateAnyone,
- [CVPR 2024] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model, National University of Singapore, ProjectPage, Code,
- [arXiv 2024.03] Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance, Nanjing University, ProjectPage, Code
- [Github repo] MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising, Tencent TMElyralab,
- [Github repo] MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation, Tencent, Code,
⭐
Datasets
Talking-face
Dataset name | Environment | Year | Resolution | Subject | Duration | Sentence |
VoxCeleb1 | Wild | 2017 | 360p~720p | 1251 | 352 hours | 100k |
VoxCeleb2 | Wild | 2018 | 360p~720p | 6112 | 2442 hours | 1128k |
HDTF | Wild | 2020 | 720p~1080p | 300+ | 15.8 hours | |
LSP | Wild | 2021 | 720p~1080p | 4 | 18 minutes | 100k |
Dataset name | Environment | Year | Resolution | Subject | Duration | Sentence |
CMLR | Lab | 2019 | 11 | 102k | ||
MAVD | Lab | 2023 | 1920x1080 | 64 | 24 hours | 12k |
CN-Celeb | Wild | 2020 | 3000 | 1200 hours | ||
CN-Celeb-AV | Wild | 2023 | 1136 | 660 hours | ||
CN-CVS | Wild | 2023 | 2500+ | 300+ hours |
Talking-body
- [ICRA'19] TED Gesture Dataset, ProjectPage
- [CVPR'22] TED Expressive Dataset, ProjectPage
- [ECCV'22] BEAT: A Body Expression Audio Text Dataset with Emotional and Semantic Annotations for Conversational Gesture Synthesis, ProjectPage, Code and Dataset
Metrics
Talking-face
Metric name | Description | Code/Paper |
LMD↓ | Mouth landmark distance | |
LMD↓ | Mouth landmark distance | |
MA↑ | The Insertion-over-Union (IoU) for the overlap between the predicted mouth area and the ground truth area | |
Sync↑ | The confidence score from SyncNet (Sync) | wav2lip |
LSE-C↑ | Lip Sync Error - Confidence | wav2lip |
LSE-D↓ | Lip Sync Error - Distance | wav2lip |
Metric name | Description | Code/Paper |
MAE↓ | Mean Absolute Error metric for image | mmagic |
MSE↓ | Mean Squared Error metric for image | mmagic |
PSNR↑ | Peak Signal-to-Noise Ratio | mmagic |
SSIM↑ | Structural similarity for image | mmagic |
FID↓ | Frchet Inception Distance | mmagic |
IS↑ | Inception score | mmagic |
NIQE↓ | Natural Image Quality Evaluator metric | mmagic |
CSIM↑ | The cosine similarity of identity embedding | InsightFace |
CPBD↑ | The cumulative probability blur detection | python-cpbd |
Metric name | Description | Code/Paper |
Diversity of head motions↑ | A standard deviation of the head motion feature embeddings extracted from the generated frames using Hopenet (Ruiz et al., 2018) is calculated | SadTalker |
Beat Align Score↑ | The alignment of the audio and generated head motions is calculated in Bailando (Siyao et al., 2022) | SadTalker |
Talking-body
TBD
Toolbox
- A general toolbox for AIGC, including common metrics and models https://github.com/open-mmlab/mmagic
- face3d: Python tools for processing 3D face https://github.com/yfeng95/face3d
- 3DMM model fitting using Pytorch https://github.com/ascust/3DMM-Fitting-Pytorch
- OpenFace: a facial behavior analysis toolkit https://github.com/TadasBaltrusaitis/OpenFace
- autocrop: Automatically detects and crops faces from batches of pictures https://github.com/leblancfg/autocrop
- OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation https://github.com/CMU-Perceptual-Computing-Lab/openpose
- GFPGAN: Practical Algorithm for Real-world Face Restoration https://github.com/TencentARC/GFPGAN
- CodeFormer: Robust Blind Face Restoration https://github.com/sczhou/CodeFormer
Related Links
If you are interested in avatar and digital human, we would also like to recommend you to check out other related collections:
- awesome digital human https://github.com/weihaox/awesome-digital-human