Loss functions and their derivatives

Open Memnarch opened this issue 3 years ago • 1 comments

Hi, I read your article about creating a neural network, and ended up here. I am looking into modularizing my code a bit and keep stumbling across a problem. First, am I understanding this line correctly: https://github.com/vzhou842/neural-network-from-scratch/blob/498631c08c2ca8c37107c1f8f6f18fee393e7dde/network.py#L73

Is this the derivative of the cost function (MSE in this tutorial), or how does one end up here? Does that mean, that a cost function must always exist as it's "forward" form and as it's backward(derived) from? (The later being used to initialize the starting gradient when training).

Either I am missunderstanding this, or it's hard to find the derivatives for well known cost functions (it has been a while since i have done this myself, and Math notation is a bit blurry to me).

Mar 09 '22 02:03 Memnarch

The article was interested in calculating $\frac{\delta L}{\delta w_1}$, which tells us how loss changes with respect to $w_1$. Using the chain rule, this becomes:

$$ \frac{\delta L}{\delta w_1} = \frac{\delta L}{\delta y_{pred}} \times \frac{\delta y_{pred}}{\delta w_1} $$

$\frac{\delta L}{\delta y_{pred}}$ is the loss function, which is $(y_{true} - y_{pred})^2$. The derivative of the loss function is the line of code you highlighted: $-2 \times (y_{true} - y_{pred})$.

What I don't understand is how this derivative is -2 instead of 2.

Dec 01 '22 08:12 davetang