DDPG-Keras-Torcs Does the output num of critic network (Q-value) should be 1? But the code is 3?

Does the output num of critic network (Q-value) should be 1? But the code is 3?

Open guo253 opened this issue 7 years ago • 4 comments

Hi, I wonder to know the num of Q-value in critic network should be 1 or 3? It is 3 in code,but I don't know the reason. Thank you.

May 16 '17 08:05 guo253

I think it should be 1. Don't know why 3

Jun 20 '17 02:06 quhezheng

Yes, I observed this too. Line 54 of the critic should be: V = Dense(1,activation='linear')(h3) instead of V = Dense(action_dim,activation='linear')(h3)

Note that in the Bellman equation we have r + gamma*Q. r is a scalar, and so Q must also be a scalar, otherwise we will end up having 3 Bellman equations!

Feb 26 '18 22:02 kaushikb258

I agree， in line 116 of ddpg.py：y_t = np.asarray([e[1] for e in batch]) is wrong. The right y_t should be y_t = np.asarray([e[2] for e in batch])

Oct 22 '18 13:10 QQwaken

I agree， in line 116 of ddpg.py：y_t = np.asarray([e[1] for e in batch]) is wrong. The right y_t should be y_t = np.asarray([e[2] for e in batch])

Do you reproduce the model which is close to the model given by the author？According to the modification suggestions given in other issues, I have been able to train the model, but the car does not run very well. The car often goes out of the track and cannot return to the track. Do you have any good suggestions? @QQwaken @guo253 @kaushikb258

Jan 05 '21 03:01 Maxwell2017

DDPG-Keras-Torcs DDPG-Keras-Torcs copied to clipboard

Does the output num of critic network (Q-value) should be 1? But the code is 3?

DDPG-Keras-Torcs
DDPG-Keras-Torcs copied to clipboard