fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

Is there any solution to reproduce the data2vec 2.0 pre-trained model with 4 GPUs?

Open nohhg92 opened this issue 2 years ago • 0 comments

❓ Questions and Help

What is your question?

I followed the fairseq data2vec 2.0 example https://github.com/facebookresearch/fairseq/tree/main/examples/data2vec for the speech recognition research and tried to reproduce the data2vec 2.0 pre-trained model. I have the 4-GPU environment to run the data2vec 2.0 training code and successfully trained the model, which has 6.8% WER with 4-gram LM, by finetuning 100 hours of Librispeech from the provided Base pre-trained model, which has 93M parameters. The problem occurred when I trained a pre-trained model. I tried several pretraining procedures, but the best accurate model I got only has 9.0% WER, in the case of 960hr base Librispeech pretraining + 100hr finetuning + 4-gram LM. The original paper shows 6.4% WER with same condition. Is there any solution to reproduce the pre-trained model with only 4 GPUs? The batch size is smaller than the original, so I should change some config file parameters. What parameter should I switch to reproduce the model with a small batch size? Or Is there any solution to train the model with the original batch size in 4 GPUs?

Code

What have you tried?

What's your environment?

  • fairseq Version (e.g., 1.0 or main):
  • PyTorch Version (e.g., 1.0)
  • OS (e.g., Linux):
  • How you installed fairseq (pip, source):
  • Build command you used (if compiling from source):
  • Python version:
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • Any other relevant information:

nohhg92 avatar Nov 16 '23 16:11 nohhg92