Coursera-Machine-Learning
Coursera-Machine-Learning copied to clipboard
Matrix sum in Neural network's cost function
Hi Jordi,
First of all, thanks so much for the notebooks. They really help me to follow through the course.
I have one question in your notebook 4, nnCostFunction -- where J = ... np.sum((np.log(a3.T)*(y_matrix)+np.log(1-a3).T*(1-y_matrix)))
.
I think this does matrix multiplication --> giving 10*10 matrix (or n_label * n_label). This gives a matrix, let's name this cost-matrix, Jc. This Jc matrix contains not only how a set of predicted values for one label differs from it's corresponding target (diagonal elements), but also how it is differs from targets of other labels (off-diagonal elements). For example, the multiplication would multiply a column of predicted values np.log(a3.T) of one label (e.g. k) with all columns of targets.
Then the code sums all elements of this matrix. This seems to over-calculate J. Instead of summing all the elements, I think only the diagonal elements are needed.
Please use this picture to accommodate my description, which might be confusing.
Please let me know if I misunderstood the code.
Best regards and thanks again, -Tua
Hi Tua,
The code you refer to above is the implementation of the Regularized Cost Function shown just above the code in the notebook and in section 1.4 of the Coursera exercise document. It will return a single, scalar value (not a matrix) assigned to variable J.
I am not sure I understand what you mean with 'over-calculating' cost J.
Hi Jordi,
I understand that the code is the implementation of the Regularized Cost Function shown above it.
What I meant is, I think the np.sum
in J = -1*(1/m)*np.sum((np.log(a3.T)*(y_matrix)+np.log(1-a3).T*(1-y_matrix)))
should be replaced with summing only the diagonal elements of ((np.log(a3.T)*(y_matrix)+np.log(1-a3).T*(1-y_matrix))
. Because np.sum
would sum all the elements of the e.g. the output matrix in the image below. And it will not be the same as the Regularized Cost Function the code refers to.
For simplicity, I only wrote the output of (np.log(a3.T)*(y_matrix)
but the same argument apply for np.log(1-a3).T*(1-y_matrix)
.
Please let me know your thoughts.
Best regards, -Tua