Visual_Speech_Recognition_for_Multiple_Languages
Visual_Speech_Recognition_for_Multiple_Languages copied to clipboard
Is there an audio-visual Chinese model?
Thanks for releasing the awesome work! I noticed that the Chinese lip reading model is based on the visual modality. I used the visual model but it achieved poor performance on the example video clips like https://github.com/mpc001/Visual_Speech_Recognition_for_Multiple_Languages/issues/5. Is there an audio-visual version that hopefully achieves better results?
Thanks.