Grokking-Deep-Learning Something wrong with the code in chapter 10

Something wrong with the code in chapter 10

Open Jason-XII opened this issue 1 year ago • 0 comments

I've been reading the book and strictly following the code examples. But I think there's something wrong with the code in chapter 10, when training a model using CNN to recognize the MNIST images. In the last part of the code when updating the weights:

layer_2_delta = (labels[batch_start:batch_end]-layer_2) / (batch_size*layer_2.shape[0])
layer_1_delta = layer_2_delta.dot(weights_1_2.T)*tanh2deriv(layer_1)
layer_1_delta*=dropout_mask
weights_1_2 += alpha*layer_1.T.dot(layer_2_delta)
l1d_reshape = layer_1_delta.reshape(kernel_output.shape)
k_update = flattened_input.T.dot(l1d_reshape)
kernels -= alpha*k_update

I'm gently surprised because according to what I have previously learned in the book, the layer_x_deltas should be calculating the negetive derivatives of the loss functions, so with the last line, I think it should be

kernels += alpha*k_update

After modifying this, I try it on my own computer. The output:

I:0 Train-Acc: 0.132
I:1 Train-Acc: 0.174
I:2 Train-Acc: 0.191
I:3 Train-Acc: 0.215
I:4 Train-Acc: 0.241
I:5 Train-Acc: 0.249
I:6 Train-Acc: 0.296
I:7 Train-Acc: 0.31
I:8 Train-Acc: 0.37
I:9 Train-Acc: 0.358
I:10 Train-Acc: 0.408
I:11 Train-Acc: 0.438
I:12 Train-Acc: 0.465
I:13 Train-Acc: 0.479
I:14 Train-Acc: 0.528
I:15 Train-Acc: 0.548
I:16 Train-Acc: 0.533
I:17 Train-Acc: 0.569
I:18 Train-Acc: 0.574
I:19 Train-Acc: 0.605
I:20 Train-Acc: 0.605
...

But with the original code, I get:

I:0 Train-Acc: 0.055
I:1 Train-Acc: 0.037
I:2 Train-Acc: 0.037
I:3 Train-Acc: 0.04
I:4 Train-Acc: 0.046
I:5 Train-Acc: 0.068
I:6 Train-Acc: 0.083
I:7 Train-Acc: 0.096
I:8 Train-Acc: 0.127
I:9 Train-Acc: 0.148
I:10 Train-Acc: 0.181
I:11 Train-Acc: 0.209
I:12 Train-Acc: 0.238
I:13 Train-Acc: 0.286
I:14 Train-Acc: 0.274
I:15 Train-Acc: 0.257
I:16 Train-Acc: 0.243
I:17 Train-Acc: 0.112
I:18 Train-Acc: 0.035
I:19 Train-Acc: 0.026
I:20 Train-Acc: 0.022

After modifying, the accuracy of the training set increases much rapidly than with the original "-=". However, it puzzles me that after 300 times of iteration, both models get an accuracy about 86%. So what's the difference? Does the code have a typo or I just simply have misunderstood it? I posted a question about this on stackoverflow. I have not typed the code wrongly. So what's wrong?

Dec 07 '23 06:12 Jason-XII

Grokking-Deep-Learning Grokking-Deep-Learning copied to clipboard

Something wrong with the code in chapter 10

Grokking-Deep-Learning
Grokking-Deep-Learning copied to clipboard