pytorch-pruning
pytorch-pruning copied to clipboard
About gradient
Excuse me, about gradient I have some not understand, why the gradient shapes same as activation output shape. And the gradient is not weight gradient? it shape is [i , o , 3, 3]?
The gradient is the gradient of the output with respect to each one of the activation outputs. Therefore the gradient shape is the same as the activation outputs shape.
@jacobgil , this place is difficult to understand. For example, gradient(final_loss, layer_weight) means the gradient of loss wrt layer weight, so the output of gradient keeps the same dimension. According to your comment, the final_loss is the output (that is the x = module(x) in your code)? and the layer_weight is each one of the activation outputs (what is this, can I find the corresponding variable in your code)? Thank you very much