Wav2Lip Higher Temporal Context window for Syncnet

Higher Temporal Context window for Syncnet

Open constan1 opened this issue 9 months ago • 1 comments

I was wondering if there was a way to train Syncnet on a higher context window specifically 25 frames and 80 mel steps ( 80 corresponds to 1 second of audio). It would seem major changes would be needed in the architecture. Perhaps the Wav2lip generator speech encoder also shares the same architecture as the sync net speech encoder if you look closely. So the generator would need to output 25 samples before input into the lip sync discriminator?

Any tips on this would be appreciated. I think with a higher context window you could achieve even better sync.

May 11 '24 01:05 constan1

Wav2Lip Wav2Lip copied to clipboard

Higher Temporal Context window for Syncnet

Wav2Lip
Wav2Lip copied to clipboard