tutorials
tutorials copied to clipboard
💡 [REQUEST] - Clarification of requires_grad about beginner/nn_tutorial.html
🚀 Describe the improvement or the new tutorial
It was challenging for me to initially grasp why requires_grad was done after weights, but in the same line as bias under https://docs.pytorch.org/tutorials/beginner/nn_tutorial.html#neural-net-from-scratch-without-torch-nn
At first glance, the code looks inconsistent:
-
weightsinitialization is split into two lines. -
biasinitialization is done in one line.
The Logic Gap The tutorial currently explains that we do it, but not exactly why the distinction exists between these two specific variables.
-
The Bias is created using a factory function (
torch.zeros) with no subsequent mathematical operations. It is born as a "Leaf Node" (a source parameter). -
The Weights involve a mathematical operation (
/ math.sqrt(...)). If we setrequires_grad=Trueinsidetorch.randn(), PyTorch records the division as a computational step. The resultingweightsvariable becomes a non-leaf node (a calculated outcome), which the optimizer cannot update.
Proposed Improvement
I propose modifying the comment block to explicitly mention that requires_grad must be deferred until after the initialization math is complete to preserve the tensor as a trainable parameter (Leaf Node).
Existing tutorials on this topic
- https://docs.pytorch.org/tutorials/beginner/nn_tutorial.html
- https://docs.pytorch.org/tutorials/beginner/nn_tutorial.html#neural-net-from-scratch-without-torch-nn