fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

Speech recognition reproducibility

Open Bobrosoft98 opened this issue 6 years ago • 15 comments

Hi,

I am having trouble reproducing the speech recognition results. With the default settings, the model stagnates at 25% train accuracy. By employing a different optimizer, increasing the batch size and tuning the lr, I was able to reach 8% WER, but that is far from the claimed 5% without tuning.

Could you please provide additional info about your configuration (the model and number of GPUs, the total batch size), or even better: logs and/or model checkpoints?

Thank you.

Bobrosoft98 avatar Sep 17 '19 11:09 Bobrosoft98

@okhonko

huihuifan avatar Sep 17 '19 11:09 huihuifan

Hi,

I'm having similar results on 1 GPU for a different dataset. Could you share with us the parameters you used to improve the results?

Thank you

carlosep93 avatar Sep 19 '19 07:09 carlosep93

hi, i was having similar issues but was able to do better with the default settings on one gpu by simulating the larger batch size with --update-freq 16

alexbie98 avatar Sep 23 '19 20:09 alexbie98

@alexbie98 I actually used this parameter when training on 1 GPU, and it didn't help. Can you elaborate on "do better"? Did you replicate the paper's WER?

@carlosep93 My parameters were: --optimizer adam --lr 5e-4 --fp16 --memory-efficient-fp16 --warmup-updates 2500 --update-freq 4

I also changed the batching logic to pack as much data on each GPU as possible, resulting in the average batch size 670 for all 8 GPUs. Only after that it started properly training.

Bobrosoft98 avatar Sep 24 '19 18:09 Bobrosoft98

right now it's at 96% train acc/91.7% valid acc after training for 5 days (epoch 31). Have not yet matched the reported WER, getting 9.9 on the current checkpoint. The loss/acc plateaus for a bit before dropping quite low.

https://i.imgur.com/XBL1TZo.png

alexbie98 avatar Sep 24 '19 18:09 alexbie98

Wow, that looks nice! What batch size do you have? Also, could you share the accuracy plot?

Bobrosoft98 avatar Sep 24 '19 19:09 Bobrosoft98

https://i.imgur.com/dKadcXq.png

The effective batch size is 80k. My training command is the same as the one in the repo with --update-freq 16

alexbie98 avatar Sep 24 '19 20:09 alexbie98

Thanks for providing the plot! Are you sure about 80k? I think, the whole librispeech train set has around 200k utterances, which means 3 batches per epoch in your case.

Bobrosoft98 avatar Sep 24 '19 23:09 Bobrosoft98

sorry 80k tokens*, using the default command's --max-tokens 5000 with --update-freq 16, the average number of sentences is around 60

alexbie98 avatar Sep 25 '19 14:09 alexbie98

https://i.imgur.com/dKadcXq.png

The effective batch size is 80k. My training command is the same as the one in the repo with --update-freq 16 sorry for oot reply,

Could you share how do you plot the training accuracy?

edosyhptra avatar Apr 18 '21 14:04 edosyhptra

https://i.imgur.com/dKadcXq.png The effective batch size is 80k. My training command is the same as the one in the repo with --update-freq 16 sorry for oot reply,

Could you share how do you plot the training accuracy?

If I recall correctly, specifying a directory to --tensorboard-logdir will generate these plots viewable from tensorboard. I haven't used this a while though.

alexbie98 avatar Apr 19 '21 03:04 alexbie98

This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!

stale[bot] avatar Jul 21 '21 00:07 stale[bot]

@alexbie98 do you still have the code when you run it right now ?

itsmekhoathekid avatar Mar 01 '25 05:03 itsmekhoathekid

I don't have the code. I have notes of the hyperparameters: adadelta with lr=1.0; --update-freq=16 + 5k tokens = 80k tokens; dropout=0.15, gradient clipping=10.0

alexbie98 avatar Mar 03 '25 05:03 alexbie98

damn i've reduced the number of params, used different optimizers and bigger batch size and got overfitting issue lol

itsmekhoathekid avatar Mar 03 '25 16:03 itsmekhoathekid