fairseq Help with replicating the results for Hubert Pretraining

Help with replicating the results for Hubert Pretraining

Open a43992899 opened this issue 2 years ago • 2 comments

❓ Questions and Help

What is your question?

I am trying to replicate the HuBERT base pretraining iter1 on librispeech 960hr. However, the training curve seems to be weird, as the unmask correct rate degrades fast. It seems like the model is not converging correctly, is this related to my kmeans? Are there any secret ingredients for training HuBERT? What will be the normal curve when the model is correctly trained?

What's your environment?

fairseq Version (main)
PyTorch Version (1.12.1)
OS (ubuntu)
How you installed fairseq: source
Build command you used (if compiling from source):
Python version: 3.8
CUDA/cuDNN version: 11.3, 8.3
GPU models and configuration: 4*A100
Any other relevant information:

Sep 25 '22 18:09 a43992899

Can you share what the WER is when you fine-tune the model? It doesn't seem weird to me since the accuracy on masked tokens continues improving and that is what models is optimized for

Sep 28 '22 17:09 wnhsu

Hi Did you find a solution to the problem? I get the same decrease in loss. When uploading the open checkpoint released by meta it seems like the unmasked loss is much better.

Apr 04 '24 19:04 hadas

fairseq fairseq copied to clipboard

Help with replicating the results for Hubert Pretraining

❓ Questions and Help

What is your question?

What's your environment?

fairseq
fairseq copied to clipboard