MakeItTalk icon indicating copy to clipboard operation
MakeItTalk copied to clipboard

Is there any scope for making a lightweight model?

Open manupatel007 opened this issue 3 years ago • 3 comments

Hey! it is an awesome work on animating faces according to text. I wanted to know, if the image(just a sketch) is fixed and the audio received is varying, can we make a custom lightweight model like mobilenet, which can generate Generate input data for inference and Audio-to-Landmarks prediction using the browser's webgl only, in real time.

manupatel007 avatar Jul 01 '21 05:07 manupatel007

That's a good idea. For the audio-to-landmark part, it's already a lightweight model which can run in real time. The time consuming parts are image warping and translation network. You can try to replace the residual blocks in current image translation network by the mobilenet structure to increase the speed.

yzhou359 avatar Jul 01 '21 13:07 yzhou359

Great suggestion! It would be awesome to use this work in real time! Would this approach imply a retraining or is there a way to do it with the available checkpoints? Thank you for sharing!

luantunez avatar Jul 01 '21 15:07 luantunez

Great suggestion! It would be awesome to use this work in real time! Would this approach imply a retraining or is there a way to do it with the available checkpoints? Thank you for sharing!

We're working on a more powerful mode, like v2, along with the training code. The training code for this version is not available for now. There are available checkpoints for current version, please check the README under the root.

yzhou359 avatar Jul 01 '21 18:07 yzhou359