pytorch-seq2seq
pytorch-seq2seq copied to clipboard
Tutorial 6: [Attention is All You need] Different output at different batch size during Inference
I have trained a transformer encoder-decoder model by replacing the encoder with some pre-trained model and putting decoder-related code (Tutorial 6 Attention is all you need) on top of the encoder and the model is getting converged properly as training proceeds. Still, when I perform sequential greedy decoding after training using different batch sizes I'm getting different WER and CER on my validation data.
My validation data is having 5437 samples, during inference I also tracked the number of samples in which EOS is being predicted. Below are the observations I'm getting
| Batch Size | WER | CER | EOS detected |
|---|---|---|---|
| 1 | 0.859 | 0.672 | 5427 |
| 2 | 0.526 | 0.399 | 3915 |
| 4 | 0.378 | 0.279 | 4866 |
| 8 | 0.33 | 0.239 | 5199 |
| 16 | 0.326 | 0.235 | 5301 |
| 32 | 0.325 | 0.235 | 5361 |
| 64 | 0.326 | 0.235 | 5394 |
| 128 | 0.326 | 0.235 | 5406 |
I don't know what is causing this issue? Any idea what might be causing this behavior in the transformer model?