Deep_reinforcement_learning_Course icon indicating copy to clipboard operation
Deep_reinforcement_learning_Course copied to clipboard

The old and the new model is effectively the same?

Open yuan1202 opened this issue 6 years ago • 0 comments

Hi Simon

I am looking at your implementation of the PPO model.

After going through the code a couple of times I think in the implementation, although you created two policy instances, because of the re-use parameter is passed in the second instance, you effectively have the two identical policies in your model.

Furthermore I have not seen code that is used to transfer the weights between two policies, unlike OpenAI's implementation, in which they did this: '''Python assign_old_eq_new = U.function([],[], updates=[tf.assign(oldv, newv) for (oldv, newv) in zipsame(oldpi.get_variables(), pi.get_variables())]) '''

Therefore could you please confirm this is indeed the case. Thanks!

yuan1202 avatar May 23 '19 09:05 yuan1202