Cross-Modal-Perceptionist
Cross-Modal-Perceptionist copied to clipboard
CVPR 2022: Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?
Cross-Modal Perceptionist
CVPR 2022 "Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?"
Cho-Ying Wu, Chin-Cheng Hsu, Ulrich Neumann, University of Southern California
[Paper] [Project page] [Voxceleb-3D Data]
[TODO]: 2. Evaluation code 3. Training code
We study the cross-modal learning and analyze the correlation between voices and 3D face geometry. Unlike previous methods for studying this correlation between voices and faces and only work on the 2D domain, we choose 3D representation that can better validate the supportive evidence from the physiology of the correlation between voices and skeletal and articulator structures, which potentially affect facial geometry.
Comparison of recovered 3D face meshes with the baseline.
Consistency for the same identity using different utterances.
Demo: Preprocessed fbank
We test on Ubuntu 16.04 LTS, NVIDIA 2080 Ti (only GPU is supported), and use anaconda for installing packages
Install packages
-
conda create --name CMP python=3.8 -
Install PyTorch compatible to your computer, we test on PyTorch v1.9 (should be compatible with other 1.0+ versions)
-
install other dependency: opencv-python, scipy, PIL, Cython, pyaudio
Or use the environment.yml we provide instead:
conda env create -f environment.ymlconda activate CMP
-
Build the rendering toolkit (by c++ and cython) for overlapping 3D meshes on images with configurations
cd Sim3DR bash build_sim3dr.sh cd ..
Download pretrained models and 3DMM configuration data
- Download from [here] (~160M) and unzip under the root folder. This will create 'pretrained_models' and 'train.configs' under the root folder.
Read the preprocessed fbank for inference
python demo.py(This will fetch the preprocessed MFCC and use them as network inputs)- Results will be generated under
data/results/(pre-generated references are underdata/results_reference)
More preprocessed MFCC and 3D mesh (3DMM params) pairs can be downloaded: [Voxceleb-3D Data].
Demo: Use device mic input
-
Do the above 1-5 step. Plus, download the face type meshes and extract under ./face_types
-
python demo_mic.pyThe demo will take 5 seconds recording from your device and predict the face mesh.
We perform unsupervised gender classfication based on mean male and female shape and calculate the statistics between the predicted face and mean shape. Also we calculate the distance between the four types of faces (Regular, Slim, Skinny, Wide)and indicate which type the voice is closer to.
- Results will be generated under data/results
Citation
If you find our work useful, please consider to cite us.
@inproceedings{wu2022cross,
title={Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?},
author={Wu, Cho-Ying and Hsu, Chin-Cheng and Neumann, Ulrich},
booktitle={CVPR},
year={2022}
}
This project is developed on [SynergyNet], [3DDFA-V2] and [reconstruction-faces-from-voice]