higgsfield icon indicating copy to clipboard operation
higgsfield copied to clipboard

Gaussian distribution of the policy

Open nyck33 opened this issue 5 years ago • 1 comments

nyck33 avatar May 25 '19 16:05 nyck33

I am just learning how to implement PPO for continuous action spaces and this repo has been a godsend except one point. If we assume that the output of the actor is the mu or mean of the policy, then I can see how the new action would always have a lower probability if based on the old policy's distribution. Thus, the ratio of probability of new policy / probability of old policy would always be between 0 and 1 so we can omit the clip of 1 + epsilon, ie. 1 - epsilon would work in all cases. Am I on the right path here? I know this is not stackoverflow but nobody is answering me over there...

nyck33 avatar May 25 '19 16:05 nyck33