Loss functions and their derivatives
Hi, I read your article about creating a neural network, and ended up here. I am looking into modularizing my code a bit and keep stumbling across a problem. First, am I understanding this line correctly: https://github.com/vzhou842/neural-network-from-scratch/blob/498631c08c2ca8c37107c1f8f6f18fee393e7dde/network.py#L73
Is this the derivative of the cost function (MSE in this tutorial), or how does one end up here? Does that mean, that a cost function must always exist as it's "forward" form and as it's backward(derived) from? (The later being used to initialize the starting gradient when training).
Either I am missunderstanding this, or it's hard to find the derivatives for well known cost functions (it has been a while since i have done this myself, and Math notation is a bit blurry to me).
The article was interested in calculating $\frac{\delta L}{\delta w_1}$, which tells us how loss changes with respect to $w_1$. Using the chain rule, this becomes:
$$ \frac{\delta L}{\delta w_1} = \frac{\delta L}{\delta y_{pred}} \times \frac{\delta y_{pred}}{\delta w_1} $$
$\frac{\delta L}{\delta y_{pred}}$ is the loss function, which is $(y_{true} - y_{pred})^2$. The derivative of the loss function is the line of code you highlighted: $-2 \times (y_{true} - y_{pred})$.
What I don't understand is how this derivative is -2 instead of 2.