DDPG-Keras-Torcs icon indicating copy to clipboard operation
DDPG-Keras-Torcs copied to clipboard

DDPG replication

Open sahiliitm opened this issue 8 years ago • 2 comments

Hi,

I believe that in DDPG the Value function output is a single scalar and not same as action size. Hence this line in CriticModel.py should be

V = Dense(1,activation='linear')(h3)

Corresponding in ddpg.py the definition of y_t can be changed to

y_t = np.zeros((states.shape[0],1))

Although I'm not sure how this would affect learning, I believe this is the right way to replicate ddpg.

sahiliitm avatar Oct 17 '16 07:10 sahiliitm

I think you are right. Becuase reward is a scaler, which can not server a vector Q value

quhezheng avatar Jun 19 '17 02:06 quhezheng

Hi @sahiliitm , I wanna ask did you get the same or better result after changing the two lines here?

LevineYang avatar Mar 29 '18 00:03 LevineYang