Massine Y
Results
3
comments of
Massine Y
Looks like a bug. `torch.dot()` only works on 1D vectors. You could try using `torch.sum(grad * u)` instead. Unless you need this urgently, I'd suggest waiting for [the official implementation...
FYI https://github.com/Liuhong99/Sophia
According to https://arxiv.org/pdf/1701.04128.pdf: >we place a gradient signal of 1 at the center of the output plane and 0 everywhere else, and then back-propagate this gradient through the network to...