Massine Y

Results 3 comments of Massine Y

Looks like a bug. `torch.dot()` only works on 1D vectors. You could try using `torch.sum(grad * u)` instead. Unless you need this urgently, I'd suggest waiting for [the official implementation...

FYI https://github.com/Liuhong99/Sophia

According to https://arxiv.org/pdf/1701.04128.pdf: >we place a gradient signal of 1 at the center of the output plane and 0 everywhere else, and then back-propagate this gradient through the network to...