stable-baselines icon indicating copy to clipboard operation
stable-baselines copied to clipboard

PPO2 implementation details?

Open FabioPINO opened this issue 4 years ago • 3 comments

Where can I find the implementation details that differentiate the PPO2 algorithm from the original version reported in Proximal Policy Optimization Algorithms by Schulman?

FabioPINO avatar Sep 29 '21 14:09 FabioPINO

I do not think there is exhaustive document on this. For a closer match with Schulman's paper, check out the original baselines repository. I think there has been some small changes over the years to PPO2, but nothing major (e.g. fixing off-by-one mistakes and such).

Miffyli avatar Sep 29 '21 14:09 Miffyli

I think Costa's blog is current the best to have all the implementation details that are in PPO: https://costa.sh/blog-the-32-implementation-details-of-ppo.html

But best is also to look at SB3 code now ;)

araffin avatar Sep 29 '21 14:09 araffin

Thank you for your prompt replies @Miffyli and @araffin! More specifically, what are the code-level optimizations used in PPO2. And in addition, how is the exploration carried out?

FabioPINO avatar Sep 29 '21 14:09 FabioPINO