attention-is-all-you-need-pytorch icon indicating copy to clipboard operation
attention-is-all-you-need-pytorch copied to clipboard

slow and inaccurate

Open xiaoshingshing opened this issue 5 years ago • 2 comments

I run

python train.py -data_pkl ./bpe_deen/bpe_vocab.pkl -train_path ./bpe_deen/deen-train -val_path ./bpe_deen/deen-val -log deen_bpe -embs_share_weight -proj_share_weight -label_smoothing -save_model trained -b 64 -warmup 128000 -epoch 400

But the training is slow and inaccurate.

[ Epoch 306 ]
  - (Training)   ppl:  69.81372, accuracy: 39.721 %, elapse: 24.194 min
  - (Validation) ppl:  501.22445, accuracy: 18.472 %, elapse: 0.088 min
[ Epoch 307 ]
  - (Training)   ppl:  69.74481, accuracy: 39.743 %, elapse: 24.189 min
  - (Validation) ppl:  463.02458, accuracy: 19.354 %, elapse: 0.089 min

Here I change batch_size from 256 to 64 because of the limit of cuda memory, is this the reason?

xiaoshingshing avatar Feb 17 '20 09:02 xiaoshingshing

I run

python train.py -data_pkl ./bpe_deen/bpe_vocab.pkl -train_path ./bpe_deen/deen-train -val_path ./bpe_deen/deen-val -log deen_bpe -embs_share_weight -proj_share_weight -label_smoothing -save_model trained -b 64 -warmup 128000 -epoch 400

But the training is slow and inaccurate.

[ Epoch 306 ]
  - (Training)   ppl:  69.81372, accuracy: 39.721 %, elapse: 24.194 min
  - (Validation) ppl:  501.22445, accuracy: 18.472 %, elapse: 0.088 min
[ Epoch 307 ]
  - (Training)   ppl:  69.74481, accuracy: 39.743 %, elapse: 24.189 min
  - (Validation) ppl:  463.02458, accuracy: 19.354 %, elapse: 0.089 min

Here I change batch_size from 256 to 64 because of the limit of cuda memory, is this the reason?

hello, I also encountered this exactly the same issue, have you got the reason for this?

zhang-mohole avatar May 09 '20 15:05 zhang-mohole

I have the same issue too.

demdecuong avatar Aug 20 '20 07:08 demdecuong