skip-thoughts
skip-thoughts copied to clipboard
Issue with Adam implementation
Thanks for releasing the code! I noticed that in skip-thoughts/training/optim.py, the roles for beta1 and beta2 in the paper (https://arxiv.org/pdf/1412.6980.pdf) are replaced with 1-beta1 and 1-beta2 when updating the exp averages m_t and v_t, but not when computing the update lr_t. I have reproduced your model in Pytorch using the default Adam implementation and results are comparable. I suspect this is because (1-beta)^t and beta^t vanish exponentially so for large t, having replaced 1 - (1-beta)^t with 1 - beta^t will not change much. Do you have any other ideas why?