fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

How to pre-train a multilingual HuBERT?

Open tarudesu opened this issue 1 year ago • 3 comments

@wnhsu I am curious about the way to train a multilingual HuBERT as this. And, can I just continue to pre-train HuBERT on another language by loading the original HuBERT checkpoint and resuming the training phase on the other dataset (another language).

Could someone explain this to me, please? Thank you in advance!

tarudesu avatar Mar 12 '23 16:03 tarudesu

I am also interested to run the same kind of experiments. I would like to continue pre-training wav2vec2.0 model with new language/dataset rather than pre-training from scratch. Please let me know if you find an anwer.

asadullah797 avatar Apr 22 '24 18:04 asadullah797

I gave up doing this for a long time @asadullah797. But still looking for a simple technical approach too.

tarudesu avatar Apr 22 '24 18:04 tarudesu

I am also looking for similar options. I am searching if someone might have posted this question before. I think I have find something similar here: https://github.com/mailong25/self-supervised-speech-recognition 1.1 and 1.2 does the trick, I am going to give a try.

asadullah797 avatar Apr 22 '24 18:04 asadullah797