Lip2Wav-pytorch
Lip2Wav-pytorch copied to clipboard
Consistency of input frame Leno and generated waveform length
I wonder how to ensure the consistency of input frame length and output waveform length ? When I use GRID datasets to train and test and set the hyper parameters as follow: T = 40 overlap = 10 mel_step_size = 160 mel_overlap = 40 img_size = 96 fps = 25, Test results shows that the ground truth is 3 seconds while the generated waveforms are 7 seconds. How can I solve this problem? Looking forward to your reply!