LiveSpeechPortraits Is my understanding correct?

Is my understanding correct?

Open JJJIANGJIANG opened this issue 2 years ago • 3 comments

Hello, my understanding of the division of the paper is that the third part is the practical application stage of adding an audio-driven portrait speech to the trained character image model, and the fourth part is to give a wild video and then train the corresponding model. May I ask if my understanding is correct? Thank you very much!

Apr 18 '22 01:04 JJJIANGJIANG

More clearly, the sec. 3 describes what is the architecture, what it consists of, and how it works, i.e., forward pass. In sec. 4, we describe how to build such a system (in sec. 3 we just illustrate what it is but not how to build this), and that is why the name of sec. 4 is Implementation Details and sec. 3 is named as Method.

Apr 18 '22 05:04 YuanxunLu

Is this written for the training process of wild video or input voice program running process?

Apr 18 '22 05:04 JJJIANGJIANG

The proposed model is a person-specific method, and you should train each model for each wild target person and sec. 4. describes this process.

Apr 18 '22 06:04 YuanxunLu

LiveSpeechPortraits LiveSpeechPortraits copied to clipboard

Is my understanding correct?

LiveSpeechPortraits
LiveSpeechPortraits copied to clipboard