Wav2Lip
Wav2Lip copied to clipboard
How to train SyncNet with frames of 3 and 7?
Hello, I want to train Syncnet with the number of image sequences at 3 and 7, but I don't know if my configuration is correct. In the case of 5 frames of image, the syncnet_T is 5 and the syncnet_mel_step_size is 16. One frame of image corresponds to 3.2 frames of audio. So, When the input image is 3 and 7 frames, the corresponding syncnet_mel_step_size is 9.6 and 22.4???