lectures Would you mind explaining an issue about gradient descent in lecture 1b

Would you mind explaining an issue about gradient descent in lecture 1b

Open theanhle opened this issue 7 years ago • 1 comments

I've read your slides in lecture 1b (Deep neural network are our friends). In slide: "Gradient are our friends" explaining arg min C(w, b): w0, b0 = 2, 2; C(w0, b0) = 68. This's correct. But after that, I don't understand why the results of expression sum(-2(y^ - y)*x) are: 8, -40, -72. I think that: -8, 40, 72 are correct.
By the way, I implemented this simple network but when I trained it through 100 times, the value of cost function was not convergent. Here is my code:

import numpy as np 
x=np.array([1,5,6])
y=np.array([0,16,20])
w = 2
b = 2
epoches = 101
learning_rate = 0.05
for epoch in range(epoches):
    out = x*w + b
    cost = np.sum((y - out)**2) 
    if(epoch % 10 ==0):
        print('Epoch:', epoch, ', cost:', cost)
    dcdw = np.sum(-2*(out - y)*x)
    dcdb = np.sum(-2*(out - y))
    w = w - learning_rate*dcdw
    b = b - learning_rate*dcdb

, and here is result: Epoch: 0 , cost: 68 Epoch: 10 , cost: 1.1268304493e+19 Epoch: 20 , cost: 3.00027905999e+36 Epoch: 30 , cost: 7.98849058743e+53 Epoch: 40 , cost: 2.12700154184e+71 Epoch: 50 , cost: 5.66331713039e+88 Epoch: 60 , cost: 1.50790492101e+106 Epoch: 70 , cost: 4.01492128811e+123 Epoch: 80 , cost: 1.06900592505e+141 Epoch: 90 , cost: 2.84631649237e+158 Epoch: 100 , cost: 7.57855254577e+175

Please explain for me. Thank you in advance!

Mar 02 '17 18:03 theanhle

Hey, two issues here.

First: your gradient calculation is off. When you define the cost as (y - out)**2 then the derivative w.r.t. w will be -2*(y - out)*x and not -2*(out - y)*x. So it seems like you just mixed it up there. Same issue for your gradient w.r.t. b.

Second: Diverging cost is usually a sign for a too-high learning rate. Try something lower. Go in steps of dividing by 10.

Mar 23 '17 20:03 mleue

lectures lectures copied to clipboard

Would you mind explaining an issue about gradient descent in lecture 1b

lectures
lectures copied to clipboard