DDPG-Keras-Torcs
DDPG-Keras-Torcs copied to clipboard
DDPG replication
Hi,
I believe that in DDPG the Value function output is a single scalar and not same as action size. Hence this line in CriticModel.py
should be
V = Dense(1,activation='linear')(h3)
Corresponding in ddpg.py
the definition of y_t can be changed to
y_t = np.zeros((states.shape[0],1))
Although I'm not sure how this would affect learning, I believe this is the right way to replicate ddpg.
I think you are right. Becuase reward is a scaler, which can not server a vector Q value
Hi @sahiliitm , I wanna ask did you get the same or better result after changing the two lines here?