Visual_Speech_Recognition_for_Multiple_Languages icon indicating copy to clipboard operation
Visual_Speech_Recognition_for_Multiple_Languages copied to clipboard

Is there an audio-visual Chinese model?

Open cooelf opened this issue 1 year ago • 0 comments

Thanks for releasing the awesome work! I noticed that the Chinese lip reading model is based on the visual modality. I used the visual model but it achieved poor performance on the example video clips like https://github.com/mpc001/Visual_Speech_Recognition_for_Multiple_Languages/issues/5. Is there an audio-visual version that hopefully achieves better results?

Thanks.

cooelf avatar Jun 01 '23 11:06 cooelf