CS224n
CS224n copied to clipboard
Assignment 1 q2_neural.py softmax gradient not explicitly calculated
In calculating gradients, the gradient of the softmax function is not calculated using the formula that is derived in the lecture notes. It seems like in the code, this step is skipped over, and the gradient of the cost function with respect to yhat is used only ('d3' variable). Am I missing something here?
I found the codes in Backprop is nothing wrong. Actually, binary classification via cross entropy with softmax has a very simple derivative formula, which is yhat - labels. You can find more details in https://deepnotes.io/softmax-crossentropy
Hope this would help you.