DDPG-Keras-Torcs DDPG replication

DDPG replication

Open sahiliitm opened this issue 8 years ago • 2 comments

Hi,

I believe that in DDPG the Value function output is a single scalar and not same as action size. Hence this line in CriticModel.py should be

V = Dense(1,activation='linear')(h3)

Corresponding in ddpg.py the definition of y_t can be changed to

y_t = np.zeros((states.shape[0],1))

Although I'm not sure how this would affect learning, I believe this is the right way to replicate ddpg.

Oct 17 '16 07:10 sahiliitm

I think you are right. Becuase reward is a scaler, which can not server a vector Q value

Jun 19 '17 02:06 quhezheng

Hi @sahiliitm , I wanna ask did you get the same or better result after changing the two lines here?

Mar 29 '18 00:03 LevineYang