For addition adding incrementing grading makes sense, I can't make sense out of the incrementing it for multiplication too, potential bug?
def mul(self, other): other = other if isinstance(other, Value) else Value(other) out = Value(self.data * other.data, (self, other), '*')
def _backward():
self.grad += other.data * out.grad
other.grad += self.data * out.grad
out._backward = _backward
return out
If you have an expression of type (xy)(x*z) then the gradient w.r.t x is not additive, right?
I don't get what you mean by the expression '(xy)(x*z)' but here is the logic behind incrementing the previous value of the gradient:
Consider an expression like y = (a * b) + (a * c).
When we are evaluating the expression (a*b) to find the gradient of y with respect to a and b, we say that the gradient of y with respect to a is out.grad * b (for this example out.grad will be 1 at that point) and the gradient of y with respect to b is a * out.grad.
So what we currently have is a.grad = b b.grad = a
Then when we are trying to evaluate the second expression (a * c) by a similar procedure, we find
c.grad = a
but here we should not say a.grad = c. We should increment the previous a.grad by c. So, a.grad += c.
In the end we should have:
a.grad = b+c b.grad = a c.grad = c
Obviously this is what we expect with regular calculus.
I hope this clears things out for you.