ACProp-Optimizer
ACProp-Optimizer copied to clipboard
The problem in training transformer
I use acprop to train transformer, but it breaks down and keeps 0 bleu. Are there some bugs in the code? the transformer is from https://github.com/juntang-zhuang/fairseq-adabelief lr=5e-4 and 2e-5, eps=1e-16