DDPG-Keras-Torcs icon indicating copy to clipboard operation
DDPG-Keras-Torcs copied to clipboard

Does the output num of critic network (Q-value) should be 1? But the code is 3?

Open guo253 opened this issue 7 years ago • 4 comments

Hi, I wonder to know the num of Q-value in critic network should be 1 or 3? It is 3 in code,but I don't know the reason. Thank you.

guo253 avatar May 16 '17 08:05 guo253

I think it should be 1. Don't know why 3

quhezheng avatar Jun 20 '17 02:06 quhezheng

Yes, I observed this too. Line 54 of the critic should be: V = Dense(1,activation='linear')(h3) instead of V = Dense(action_dim,activation='linear')(h3)

Note that in the Bellman equation we have r + gamma*Q. r is a scalar, and so Q must also be a scalar, otherwise we will end up having 3 Bellman equations!

kaushikb258 avatar Feb 26 '18 22:02 kaushikb258

I agree, in line 116 of ddpg.py:y_t = np.asarray([e[1] for e in batch]) is wrong. The right y_t should be y_t = np.asarray([e[2] for e in batch])

QQwaken avatar Oct 22 '18 13:10 QQwaken

I agree, in line 116 of ddpg.py:y_t = np.asarray([e[1] for e in batch]) is wrong. The right y_t should be y_t = np.asarray([e[2] for e in batch])

Do you reproduce the model which is close to the model given by the author?According to the modification suggestions given in other issues, I have been able to train the model, but the car does not run very well. The car often goes out of the track and cannot return to the track. Do you have any good suggestions? @QQwaken @guo253 @kaushikb258

Maxwell2017 avatar Jan 05 '21 03:01 Maxwell2017