Reinforcement-learning-with-tensorflow icon indicating copy to clipboard operation
Reinforcement-learning-with-tensorflow copied to clipboard

PPO and Reward

Open yangtianyong opened this issue 6 years ago • 0 comments

Hello Zhou: I get confused about how does the Reward work to guide the PPO to train the ANNs?

             1、For example,I feed a batch_size data to the ANNs,then I will get a Reward,and the reward is working by the PPO Gradient formula?(I mean in the Gradient formula,we multiply by R )
              2、When I get a series of rewrad ,I want to konw if my idea is right.That is I add every reward ,if I get a new one ,I add it too,then get a maximum result  with the PPO Gradient formula,and then the best policy!Am I right?

          Please do not mind my terrible English and I hope my  description is clear!

Thank you for your advise!

yangtianyong avatar Aug 23 '19 08:08 yangtianyong