Audio-driven-TalkingFace-HeadPose
Audio-driven-TalkingFace-HeadPose copied to clipboard
Some questions about the dataset of LRW when training
Hi, thanks for the great job. I have a little question about the training process of the audio to epression and pose mapping part. While testing there is a finetune process, and I found it in the dataset.py which the class " News_1D_lstm_3dmm_pose" is selected.
I guess when trained with LRW dataset, the corresponding dataset should be "LRW_1D_lstm_3dmm_pose", it is correct or not ?
If it is correct, I saw it there is a random index from here : "r = random.choice([x for x in range(3,8])", seems like the mfcc features and expression features are random sampled from time. So when training , for one sample the features may be not equal because of this random selection? I just wondering whether will it bring some problems while training ?
Thank you!
Yes this part is random, we randomly take 16 frames from each 29 frame clip in LRW dataset during training. We have not tested the influence of this randomness.. Thank you for reminding.
when use my own data for training,can I drop the random part, and use whole frames of the video sample to train?Thanks for your project and answer.
@yangchunyong have you sloved this problem? how do you do when use your own data for training, drop the random part, and use whole frames of the video sample to train.
hi~ @yangchunyong and @chenbolinstudent could you please share me with the LRW dataset? I have applied for permission to use this dataset, but the administrator is on vacation now, quote by Rob Cooper "Thanks for your message. I'm on holiday, back on Thursday 7th January. I'll get back to you then."
I'm in a hurry to use it, could you please content me by email: [email protected]. Thank you very much.