zhangtj1996
Results
3
issues of
zhangtj1996
Hi, I'm wondering what's the meaning of `mu[i] * self._T ` in the first part of likelihood. It's not consistent with the paper, which should be lambda*delta t
For optimizers like sgd+momentum, adam, rmsprop, they may use the historical information of the gradients. Does this implementation maintain / reset / interpolate the momentum in each outer loop?
Will RDC for redundancy be supported in the near future?