skip-thoughts Issue with Adam implementation

Issue with Adam implementation

Open ana-mariacretu opened this issue 6 years ago • 0 comments

Thanks for releasing the code! I noticed that in skip-thoughts/training/optim.py, the roles for beta1 and beta2 in the paper (https://arxiv.org/pdf/1412.6980.pdf) are replaced with 1-beta1 and 1-beta2 when updating the exp averages m_t and v_t, but not when computing the update lr_t. I have reproduced your model in Pytorch using the default Adam implementation and results are comparable. I suspect this is because (1-beta)^t and beta^t vanish exponentially so for large t, having replaced 1 - (1-beta)^t with 1 - beta^t will not change much. Do you have any other ideas why?

Jul 05 '18 09:07 ana-mariacretu

skip-thoughts skip-thoughts copied to clipboard

Issue with Adam implementation

skip-thoughts
skip-thoughts copied to clipboard