GuruGuha

Results 4 comments of GuruGuha

because autograd.grad takes in a sequence(tuple) of tensors (w.r.t which the gradient of ouput have to be computed as inputs) the return is also a sequence(tuple) of gradient tensors w.r.t...

Did you try training the darknet_53 model using Adam ? Curiously, I notice the convergence is a lot worser than what it is using SGD... this is contrary to my...