ML-From-Scratch
ML-From-Scratch copied to clipboard
Adam mhat and vhat updates
https://github.com/eriklindernoren/ML-From-Scratch/blob/a2806c6732eee8d27762edd6d864e0c179d8e9e8/mlfromscratch/deep_learning/optimizers.py#L125
While updating mhat and vhat in Adam, shouldn't we also consider the update number(t) for the weight decay?
m_hat = m/(1 - pow(beta,t))