Lip2Wav icon indicating copy to clipboard operation
Lip2Wav copied to clipboard

WER

Open Domhnall-Liopa opened this issue 3 years ago • 0 comments

Hi, thanks for the great work.

When I test the pre-trained multi-speaker model on the LRW test set I get similar STOI and ESTOI values quoted in the paper but the best WER I can achieve is 79.6% compared to the 34.2% in the paper.

Could you specify the steps you used to achieve 34.2% WER with Google ASR? Do you crop the synthesised word and use a specific Google ASR model/configuration? Do you use the entire LRW test dataset or just a subset?

It would be great to know for fair comparison of future research.

Thanks

Domhnall-Liopa avatar Dec 07 '21 18:12 Domhnall-Liopa