course22
course22 copied to clipboard

Published 20 hours ago •

Reame
Issues

Possible Error

Open aalbiol opened this issue 1 year ago • 0 comments

In the notebook named "How does a neural net really work?", there is a point where the parameters of a parabola are found using gradients. There is a cell with this content

for i in range(10): loss = quad_mae(abc) loss.backward() with torch.no_grad(): abc -= abc.grad*0.01 print(f'step={i}; loss={loss:.2f}')

If you run this loop for more than 10 iterations the loss starts growing again.

In the text, it's said that this is because the learning rate must be progressively decreased in practice. In my opinion, this is because every time that loss.backward() is exectuted the gradients are "accumulated" rather than recomputed. If the gradients are reset to zero after each iteration, it converges to a minimum: ------------------------------------- Proposed code ------------------------------------------- for i in range(10): loss = quad_mae(abc) loss.backward() with torch.no_grad(): abc -= abc.grad*0.01 abc.grad.fill_(0) #New line print(f'step={i}; loss={loss:.2f}')

Let me conclude by congratulating you for this very clear explanation

Regards

Mar 09 '23 12:03 aalbiol