awesome-avatar

This is a repository for organizing papers, codes and other resources related to the topic of Avatar (talking-face and talking-body).

🔆 This project is still on-going, pull requests are welcomed!!

If you have any suggestions (missing papers, new papers, key researchers or typos), please feel free to edit and pull a request.

TO DO LIST

[x] Main paper list
[x] Researchers list
[x] Toolbox for avatar
[x] Add paper link
[ ] Add paper notes
[x] Add codes if have
[x] Add project page if have
[x] Datasets and metrics
[x] Related links

Researchers and labs

NVIDIA Research
- Neural rendering models for human generation: vid2vid NeurIPS'18, fs-vid2vid NeurIPS'19, EG3D CVPR'22;
- Talking-face synthesis: face-vid2vid CVPR'21, Implicit NeurIPS'22, SPACE ICCV'23, One-shot Neural Head Avatar arXiv'23;
- Talking-body synthesis: DreamPose ICCV'23;
- Face enhancement (relighting, restoration, etc): Lumos SIGGRAPH Asia 2022, RANA ICCV'23;
- Authorized use of synthetic videos: Avatar Fingerprinting arXiv'23;
Aliaksandr Siarohin @ Snap Research
- Neural rendering models for human generation (focus on flow-based generative models): Unsupervised-Volumetric-Animation CVPR'23, 3DAvatarGAN CVPR'23, 3D-SGAN ECCV'22, Articulated-Animation CVPR'21, Monkey-Net CVPR'19, FOMM NeurIPS'19;
Ziwei Liu @ Nanyang Technological University
- Talking-face synthesis: StyleSync CVPR'23, AV-CAT SIGGRAPH Asia 2022, StyleGANX ICCV'23, StyleSwap ECCV'22, PC-AVS CVPR'21, Speech2Talking-Face IJCAI'21, VToonify SIGGRAPH Asia 2022;
- Talking-body synthesis: MotionDiffuse arXiv'22;
- Face enhancement (relighting, restoration, etc): Relighting4D ECCV'22;
Xiaodong Cun @ Tencent AI Lab:
- Talking-face synthesis: StyleHEAT ECCV'22, VideoReTalking SIGGRAPH Asia'22, ToolTalking ICCV'23, DPE CVPR'23, CodeTalker CVPR'23, SadTalker CVPR'23;
- Talking-body synthesis: LivelySpeaker ICCV'23;

Max Planck Institute for Informatics:
- 3D face models (e.g., 3DMM): FLAME SIGGRAPH Asia 2017;

Papers

Example: [Conference'year] Title, First-author Affiliation, ProjectPage, Code

Talking-body synthesis

Co-speech gesture synthesis

[SIGGRAPH Asia 2020] Gesture Generation from Trimodal Context, Korea Advanced Institute of Science and Technology (KAIST), Code
[ICCV 2021] Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates, ShanghaiTech University, ProjectPage, Code
[ICCV 2021] Audio2Motion: Generating Diverse Gestures from Speech with Conditional Variational Autoencoders, Harbin Institute of Technology, Shenzhen, Code
[CVPR 2022] Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation, The Chinese University of Hong Kong, ProjectPage, Code
[CVPR 2023] Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation, The University of Hong Kong, Code
[CVPR 2024] EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling, The University of Tokyo, ProjectPage, Code

Pose2video

[NeurIPS 2018] Video-to-Video Synthesis, NVIDIA, ProjectPage, Code
[ICCV 2019] Everybody Dance Now, UC Berkeley, ProjectPage, Code
[arXiv 2023] Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation, Alibaba Group, ProjectPage, Code,
- [Unofficial reprod] MooreThreads/Moore-AnimateAnyone,
[CVPR 2024] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model, National University of Singapore, ProjectPage, Code,
[arXiv 2024.03] Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance, Nanjing University, ProjectPage, Code
[Github repo] MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising, Tencent TMElyralab,
[Github repo] MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation, Tencent, Code, ⭐

Datasets

Talking-face

Audio-Visual Datasets for Enlish Speakers
Dataset name	Environment	Year	Resolution	Subject	Duration	Sentence
VoxCeleb1	Wild	2017	360p~720p	1251	352 hours	100k
VoxCeleb2	Wild	2018	360p~720p	6112	2442 hours	1128k
HDTF	Wild	2020	720p~1080p	300+	15.8 hours
LSP	Wild	2021	720p~1080p	4	18 minutes	100k
Audio-Visual Datasets for Chinese Speakers
Dataset name	Environment	Year	Resolution	Subject	Duration	Sentence
CMLR	Lab	2019		11		102k
MAVD	Lab	2023	1920x1080	64	24 hours	12k
CN-Celeb	Wild	2020		3000	1200 hours
CN-Celeb-AV	Wild	2023		1136	660 hours
CN-CVS	Wild	2023		2500+	300+ hours

Talking-body

[ICRA'19] TED Gesture Dataset, ProjectPage
[CVPR'22] TED Expressive Dataset, ProjectPage
[ECCV'22] BEAT: A Body Expression Audio Text Dataset with Emotional and Semantic Annotations for Conversational Gesture Synthesis, ProjectPage, Code and Dataset

Metrics

Talking-face

Lip-Sync
Metric name	Description	Code/Paper
LMD↓	Mouth landmark distance
LMD↓	Mouth landmark distance
MA↑	The Insertion-over-Union (IoU) for the overlap between the predicted mouth area and the ground truth area
Sync↑	The confidence score from SyncNet (Sync)	wav2lip
LSE-C↑	Lip Sync Error - Confidence	wav2lip
LSE-D↓	Lip Sync Error - Distance	wav2lip
Image Quality (identity preserving)
Metric name	Description	Code/Paper
MAE↓	Mean Absolute Error metric for image	mmagic
MSE↓	Mean Squared Error metric for image	mmagic
PSNR↑	Peak Signal-to-Noise Ratio	mmagic
SSIM↑	Structural similarity for image	mmagic
FID↓	Frchet Inception Distance	mmagic
IS↑	Inception score	mmagic
NIQE↓	Natural Image Quality Evaluator metric	mmagic
CSIM↑	The cosine similarity of identity embedding	InsightFace
CPBD↑	The cumulative probability blur detection	python-cpbd
Diversity
Metric name	Description	Code/Paper
Diversity of head motions↑	A standard deviation of the head motion feature embeddings extracted from the generated frames using Hopenet (Ruiz et al., 2018) is calculated	SadTalker
Beat Align Score↑	The alignment of the audio and generated head motions is calculated in Bailando (Siyao et al., 2022)	SadTalker

Talking-body

TBD

Toolbox

A general toolbox for AIGC, including common metrics and models https://github.com/open-mmlab/mmagic
face3d: Python tools for processing 3D face https://github.com/yfeng95/face3d
3DMM model fitting using Pytorch https://github.com/ascust/3DMM-Fitting-Pytorch
OpenFace: a facial behavior analysis toolkit https://github.com/TadasBaltrusaitis/OpenFace
autocrop: Automatically detects and crops faces from batches of pictures https://github.com/leblancfg/autocrop
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation https://github.com/CMU-Perceptual-Computing-Lab/openpose
GFPGAN: Practical Algorithm for Real-world Face Restoration https://github.com/TencentARC/GFPGAN
CodeFormer: Robust Blind Face Restoration https://github.com/sczhou/CodeFormer

awesome-avatar
awesome-avatar copied to clipboard

Metadata

awesome-avatar

🔆 This project is still on-going, pull requests are welcomed!!

TO DO LIST

Researchers and labs

Papers

Avatar (face+body)

2D talking-face synthesis

3D talking-face synthesis

Talking-body synthesis

Co-speech gesture synthesis

Pose2video

Datasets

Talking-face

Talking-body

Metrics

Talking-face

Talking-body

Toolbox

Related Links

← Metadata

Owner

Metadata

awesome-avatar awesome-avatar copied to clipboard

Metadata

awesome-avatar

🔆 This project is still on-going, pull requests are welcomed!!

TO DO LIST

Researchers and labs

Papers

Avatar (face+body)

2D talking-face synthesis

3D talking-face synthesis

Talking-body synthesis

Co-speech gesture synthesis

Pose2video

Datasets

Talking-face

Talking-body

Metrics

Talking-face

Talking-body

Toolbox

Related Links

← Metadata

Owner

Metadata

awesome-avatar
awesome-avatar copied to clipboard