synthesizing_obama_network_training
synthesizing_obama_network_training copied to clipboard
Tutorial
Hello, could you maybe make a tutorial from how to run this, to train the model, to inputing your own audio file? Thank you
I hope someone will provide.
I have added some code and written a brief explanation on my repo, but this is only one part of the work. https://github.com/mrmotallebi/synthesizing_obama_network_training
Is the output of this mouth shape features?
No, only 20D PCA points. If you refer to the paper, they say, "We reshape each 18-point mouth shape into a 36-D vector, apply PCA over all frames, and represent each mouth shape by the coefficients of the first 20 PCA coefficients". The data they provide for training ('frontalfidsCoeff_unrefined.bin' files in each of the training directories) are 20D as well. I don't know how they convert from PCA coefficients back to the lip shape features.
Hi @mrmotallebi what is the range of your validation loss after training for 300 epochs? I used your code for training the network and the validation loss is around 6, but in the paper it seemed to be around 4.5. I have no idea why it is this high, could you please give me some hints?
Sorry I wouldn't know either (nor do I recall it).
Hi @yqwen @mrmotallebi, I was able to get the loss around 4.7.
The 20D output doesn't generate lip shapes directly. They then seek a set of best matching frames from target videos.
Anyone knows how to use the text file with 20D PCA points to re-time and generate a video?
No, only 20D PCA points. If you refer to the paper, they say, "We reshape each 18-point mouth shape into a 36-D vector, apply PCA over all frames, and represent each mouth shape by the coefficients of the first 20 PCA coefficients". The data they provide for training ('frontalfidsCoeff_unrefined.bin' files in each of the training directories) are 20D as well. I don't know how they convert from PCA coefficients back to the lip shape features.
Isn't the .txt output file that has been generated contains 21 PCA coefficients?
@nikhitha-m Yes, When inferencing I also get .txt output with 21 PCA coefficients. Do you make it clear why it is not 20D PCA? Thank you
17008 21 0.012750 -16.166571 -0.586295 -3.136001 1.788596 -1.369637 -0.974732 0.319079 -0.114111 -0.296762 -0.186657 -0.135444 -0.000013 -0.238547 0.013007 -0.153888 -0.132900 -0.035417 -0.032935 -0.101287 0.013592 0.012750 -18.549662 0.012876 -2.928288 1.891728 -1.304868 -0.696324 0.288222 -0.134750 -0.416818 -0.129575 -0.167370 0.034044 -0.271542 0.019169 -0.216528 -0.137812 -0.048689 -0.040546 -0.133780 -0.007853 ....
+1 for a decent tutorial