Denis Vorotyntsev
Denis Vorotyntsev
Hello! I think I found a bug in the Adan optimizer, which affects embedding tables. I implemented Adan optimzier in Tensorflow 2. You could find the implementation [here](https://github.com/DenisVorotyntsev/Adan) I wanted...
In the code the pass over full dataset is required: https://github.com/kyegomez/Sophia/blob/a0ac67586098ac9cef64dc9bdca4e11d003ff502/Sophia/Sophiav2.py#L64 while in the paper estimation is computed using a [mini-batch](https://arxiv.org/pdf/2305.14342.pdf). ## Upvote & Fund - We're using [Polar.sh](https://polar.sh/kyegomez) so...
Authors shared pseudo code of the algo -- https://twitter.com/tengyuma/status/1661413000215920655 The implementation you shared misses categorical sampling. ## Upvote & Fund - We're using [Polar.sh](https://polar.sh/kyegomez) so you can upvote and help...