Audio2Face
Audio2Face copied to clipboard
Audio to Face Blendshape
Implementation with PyTorch.
复现人:刘宇昂
- Base model
- LSTM using MFCC audio features
- CNN(ref simplified version) with LPC features
Prerequisites
- Python3
- PyTorch v0.3.0
- numpy
- librosa & audiolazy
- scipy
- etc.
Files
-
Scripts to run
-
main.py
: change net name and set checkpoints folder to train different models -
test_model.py
: generate blendshape sequences given extracted audio features (need audio features as input) -
synthesis.py
: generate blendshape directly from input wav (need arguements of input audio path)
-
-
Classes
-
models.py
: Classes with LSTM and CNN (simplified NvidiaNet) model. -
models_testae.py
: Advanced models with audoencoder design. -
dataset.py
: Class for loading dataset.
-
-
Input preprocessing
-
misc/audio_mfcc.py
: extract mfcc features from input wav files -
misc/audio_lpc.py
: extract lpc features -
misc/combine.py
: combine certain audio feature/blendshape files to obtain a single file for data loading
-
Usage
Input
To build your own dataset, you need to preprocess your wav/blendshape pairs with misc/audio_mfcc.py
or misc/audio_lpc.py
. Then combine those feature/blendshape files misc/combine.py
to a single feature/blendshape file.
Training
Modify main.py
. Set model to the one you need and also specify checkpoint folder.
Evaluation
- Both
test_model.py
andsynthesis.py
can be used to generate blendshape sequences.-
test_model.py
accepts extrated audio features (MFCC/LPC). -
synthesis.py
takes raw wav file as input - State the arguments and it will produce a blenshape test file.
-