audio2photoreal
audio2photoreal copied to clipboard
is it realtime audio 2 face ?
Hello,
Firstly, I want to extend my sincere thanks for the great work on this repository.
I have a question regarding the functionality: Is the audio-to-face feature designed to work in real-time?
Depends how much compute you throw on it and how fast your GPUs are. You can try it out on whatever compute you have available ;)
I believe that if you don't use the rendering portion, you can just run this in realtime locally on consumer devices. Incidentally, please do this for the community: https://github.com/facebookresearch/audio2photoreal/issues/4
if anyone have issue with enviroment, you can use docker:
docker run -dit --name a2p nvidia/cuda:11.6.1-devel-ubuntu20.04
docker exec -it a2p bash
apt update
apt install vim git wget gcc ffmpeg libsm6 libxext6 -y
# install miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
/root/miniconda3/bin/conda init bash
# install repo ...
from the demo description: "4) Then, sit back and wait for the rendering to happen! This may take a while (e.g. 30 minutes)" Not sure if it will help to answer the question, but for a 6s audio clip, on a V100, I got the following times for a single sample.
100% 100/100 [00:17<00:00, 5.71it/s]
created 3 samples
100% 100/100 [00:07<00:00, 14.13it/s]
created 3 samples
100% 120/120 [02:36<00:00, 1.31s/it]
Not sure what the 3rd step is (I assume the avatar renderer is more performant) Anyway as much as the first two networks, are close to real time, the last process is 30x slower than real time on a modest GPU.
indeed, providing a bone dump to SMPL or Unity biped animation bones could eliminate the third time-consuming step to make this an actual realtime technology
https://github.com/facebookresearch/audio2photoreal/issues/4