fairseq
fairseq copied to clipboard
Is there any solution to reproduce the data2vec 2.0 pre-trained model with 4 GPUs?
❓ Questions and Help
What is your question?
I followed the fairseq data2vec 2.0 example https://github.com/facebookresearch/fairseq/tree/main/examples/data2vec for the speech recognition research and tried to reproduce the data2vec 2.0 pre-trained model. I have the 4-GPU environment to run the data2vec 2.0 training code and successfully trained the model, which has 6.8% WER with 4-gram LM, by finetuning 100 hours of Librispeech from the provided Base pre-trained model, which has 93M parameters. The problem occurred when I trained a pre-trained model. I tried several pretraining procedures, but the best accurate model I got only has 9.0% WER, in the case of 960hr base Librispeech pretraining + 100hr finetuning + 4-gram LM. The original paper shows 6.4% WER with same condition. Is there any solution to reproduce the pre-trained model with only 4 GPUs? The batch size is smaller than the original, so I should change some config file parameters. What parameter should I switch to reproduce the model with a small batch size? Or Is there any solution to train the model with the original batch size in 4 GPUs?
Code
What have you tried?
What's your environment?
- fairseq Version (e.g., 1.0 or main):
- PyTorch Version (e.g., 1.0)
- OS (e.g., Linux):
- How you installed fairseq (
pip, source): - Build command you used (if compiling from source):
- Python version:
- CUDA/cuDNN version:
- GPU models and configuration:
- Any other relevant information: