LiveSpeechPortraits
LiveSpeechPortraits copied to clipboard
Is my understanding correct?
Hello, my understanding of the division of the paper is that the third part is the practical application stage of adding an audio-driven portrait speech to the trained character image model, and the fourth part is to give a wild video and then train the corresponding model. May I ask if my understanding is correct? Thank you very much!
More clearly, the sec. 3 describes what is the architecture, what it consists of, and how it works, i.e., forward pass. In sec. 4, we describe how to build such a system (in sec. 3 we just illustrate what it is but not how to build this), and that is why the name of sec. 4 is Implementation Details and sec. 3 is named as Method.
Is this written for the training process of wild video or input voice program running process?
The proposed model is a person-specific method, and you should train each model for each wild target person and sec. 4. describes this process.