Coursera-Machine-Learning icon indicating copy to clipboard operation
Coursera-Machine-Learning copied to clipboard

Matrix sum in Neural network's cost function

Open clumdee opened this issue 7 years ago • 2 comments

Hi Jordi,

First of all, thanks so much for the notebooks. They really help me to follow through the course. I have one question in your notebook 4, nnCostFunction -- where J = ... np.sum((np.log(a3.T)*(y_matrix)+np.log(1-a3).T*(1-y_matrix))).

I think this does matrix multiplication --> giving 10*10 matrix (or n_label * n_label). This gives a matrix, let's name this cost-matrix, Jc. This Jc matrix contains not only how a set of predicted values for one label differs from it's corresponding target (diagonal elements), but also how it is differs from targets of other labels (off-diagonal elements). For example, the multiplication would multiply a column of predicted values np.log(a3.T) of one label (e.g. k) with all columns of targets.

Then the code sums all elements of this matrix. This seems to over-calculate J. Instead of summing all the elements, I think only the diagonal elements are needed.

Please use this picture to accommodate my description, which might be confusing. img_20170829_155209

Please let me know if I misunderstood the code.

Best regards and thanks again, -Tua

clumdee avatar Aug 29 '17 14:08 clumdee

Hi Tua,

The code you refer to above is the implementation of the Regularized Cost Function shown just above the code in the notebook and in section 1.4 of the Coursera exercise document. It will return a single, scalar value (not a matrix) assigned to variable J.

I am not sure I understand what you mean with 'over-calculating' cost J.

JWarmenhoven avatar Aug 29 '17 20:08 JWarmenhoven

Hi Jordi,

I understand that the code is the implementation of the Regularized Cost Function shown above it.

What I meant is, I think the np.sum in J = -1*(1/m)*np.sum((np.log(a3.T)*(y_matrix)+np.log(1-a3).T*(1-y_matrix))) should be replaced with summing only the diagonal elements of ((np.log(a3.T)*(y_matrix)+np.log(1-a3).T*(1-y_matrix)). Because np.sum would sum all the elements of the e.g. the output matrix in the image below. And it will not be the same as the Regularized Cost Function the code refers to.

For simplicity, I only wrote the output of (np.log(a3.T)*(y_matrix) but the same argument apply for np.log(1-a3).T*(1-y_matrix). img_20170829_223739

Please let me know your thoughts.

Best regards, -Tua

clumdee avatar Aug 29 '17 20:08 clumdee