lectures
lectures copied to clipboard
Would you mind explaining an issue about gradient descent in lecture 1b
- I've read your slides in lecture 1b (Deep neural network are our friends). In slide: "Gradient are our friends" explaining arg min C(w, b): w0, b0 = 2, 2; C(w0, b0) = 68. This's correct. But after that, I don't understand why the results of expression sum(-2(y^ - y)*x) are: 8, -40, -72. I think that: -8, 40, 72 are correct.
- By the way, I implemented this simple network but when I trained it through 100 times, the value of cost function was not convergent. Here is my code:
import numpy as np
x=np.array([1,5,6])
y=np.array([0,16,20])
w = 2
b = 2
epoches = 101
learning_rate = 0.05
for epoch in range(epoches):
out = x*w + b
cost = np.sum((y - out)**2)
if(epoch % 10 ==0):
print('Epoch:', epoch, ', cost:', cost)
dcdw = np.sum(-2*(out - y)*x)
dcdb = np.sum(-2*(out - y))
w = w - learning_rate*dcdw
b = b - learning_rate*dcdb
, and here is result: Epoch: 0 , cost: 68 Epoch: 10 , cost: 1.1268304493e+19 Epoch: 20 , cost: 3.00027905999e+36 Epoch: 30 , cost: 7.98849058743e+53 Epoch: 40 , cost: 2.12700154184e+71 Epoch: 50 , cost: 5.66331713039e+88 Epoch: 60 , cost: 1.50790492101e+106 Epoch: 70 , cost: 4.01492128811e+123 Epoch: 80 , cost: 1.06900592505e+141 Epoch: 90 , cost: 2.84631649237e+158 Epoch: 100 , cost: 7.57855254577e+175
Please explain for me. Thank you in advance!
Hey, two issues here.
First: your gradient calculation is off. When you define the cost as (y - out)**2
then the derivative w.r.t. w will be -2*(y - out)*x
and not -2*(out - y)*x
. So it seems like you just mixed it up there. Same issue for your gradient w.r.t. b.
Second: Diverging cost is usually a sign for a too-high learning rate. Try something lower. Go in steps of dividing by 10.