av_hubert icon indicating copy to clipboard operation
av_hubert copied to clipboard

How to adapt or train AV-HuBERT for other languages?

Open cooelf opened this issue 1 year ago • 1 comments

Thanks for the awesome work! I am wondering if it is possible to make AV-HuBERT work for other languages, e.g., Chinese.

I notice that there is a multilingual version in the paper. Is it compatible with different languages? Otherwise, could you provide any suggestions, assuming there is a Chinese lip movement dataset.

Thanks!

cooelf avatar May 18 '23 13:05 cooelf

@cooelf Yes, using AV-HuBERT for other languages should also work. You can choose a pre-trained checkpoint (large or base) and fine-tune that with Chinese lip reading dataset following the fine-tuning command and refer to this for how to prepare the data. Alternatively, pre-training an AV-HuBERT model of Chinese version from scratch is also doable if you have sufficiently large amount of the audio-visual data.

We mentioned a multilingually pre-trained AV-HuBERT in the paper but that model was not released as it's not as good as the English-only one on LRS3 benchmark. JFYI, we did multilingually fine-tuned AV-HuBERT in our follow-up work and you can find the model checkpoints in this repo.

chevalierNoir avatar Jun 03 '23 22:06 chevalierNoir