audio2photoreal is it realtime audio 2 face ?

Hello,

Firstly, I want to extend my sincere thanks for the great work on this repository.

I have a question regarding the functionality: Is the audio-to-face feature designed to work in real-time?

Jan 04 '24 03:01 kingkong135

Depends how much compute you throw on it and how fast your GPUs are. You can try it out on whatever compute you have available ;)

Jan 04 '24 03:01 alexanderrichard

I believe that if you don't use the rendering portion, you can just run this in realtime locally on consumer devices. Incidentally, please do this for the community: https://github.com/facebookresearch/audio2photoreal/issues/4

Jan 05 '24 00:01 yosun

if anyone have issue with enviroment, you can use docker:

docker run -dit --name a2p nvidia/cuda:11.6.1-devel-ubuntu20.04
docker exec -it a2p bash 
apt update
apt install vim git wget gcc ffmpeg libsm6 libxext6  -y

# install miniconda 
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
/root/miniconda3/bin/conda init bash 

# install repo ...

Jan 05 '24 07:01 kingkong135

from the demo description: "4) Then, sit back and wait for the rendering to happen! This may take a while (e.g. 30 minutes)" Not sure if it will help to answer the question, but for a 6s audio clip, on a V100, I got the following times for a single sample.

100% 100/100 [00:17<00:00,  5.71it/s]
created 3 samples
100% 100/100 [00:07<00:00, 14.13it/s]
created 3 samples
100% 120/120 [02:36<00:00,  1.31s/it]

Not sure what the 3rd step is (I assume the avatar renderer is more performant) Anyway as much as the first two networks, are close to real time, the last process is 30x slower than real time on a modest GPU.

Jan 06 '24 14:01 wandrzej

indeed, providing a bone dump to SMPL or Unity biped animation bones could eliminate the third time-consuming step to make this an actual realtime technology

https://github.com/facebookresearch/audio2photoreal/issues/4

Jan 06 '24 23:01 yosun

audio2photoreal audio2photoreal copied to clipboard

is it realtime audio 2 face ?

audio2photoreal
audio2photoreal copied to clipboard