attention-is-all-you-need-pytorch icon indicating copy to clipboard operation
attention-is-all-you-need-pytorch copied to clipboard

Convergence / Overfit issue

Open astariul opened this issue 5 years ago • 2 comments

This repo is really well-written, so I decided to use it for my task : question generation.

In my architecture, I'm using only DecoderTransformer (not the whole Transformer). But I have a convergence issue, similar to #101, where the model can't overfit 10 samples.

As mentionned, I tried to decrease the learning rate, as well as changing the optimizer, but nothing work, my model simply never converge.


I'm wondering if anyone met convergence issue, and how they resolve it ! Thanks for the help

astariul avatar May 14 '19 08:05 astariul

I could make it converge :

  • By using BertAdam from pytorch-pretrained-BERT
  • By not sharing the weights of the final target embeddings (tgt_emb_prj_weight_sharing = False)

Now I have another problem : my architecture is overfitting...

I tried scaling down some parameters, but it still overfit, and affect performance.

If someone have some insights to share, I'll be glad to take it !

astariul avatar May 20 '19 00:05 astariul

so ,could i konw how to solve the overfitting problem,thanks !!^_^

zhao1402072392 avatar Oct 28 '19 11:10 zhao1402072392