cleanba Missing clipped value loss in PPO implementation

Missing clipped value loss in PPO implementation

Open francelico opened this issue 1 year ago • 1 comments

Hi @vwxyzjn ,

This codebase is great, thanks for the hard work! I've been using it to run baseline experiments in procgen, and I've noticed that your implementation of PPO does not use value loss clipping. However it is enabled by default in the Pytorch implementation that is most often encountered in papers testing agents in procgen.

Is there a reason why it was left out? I'm not super familiar with ALE, perhaps it is not as common there?

As part of my project I've created scripts to train and evaluate PPO in procgen* and I've implemented the DAAC agent (https://arxiv.org/abs/2102.10330). Would you like me to make a PR to include them to cleanba?

*On top of re-implementing value loss clipping in PPO I found minor differences between the atari and procgen environments, such as the info dict returned by envpool.step() being slightly different, and the videos in the eval script supporting grayscale images only.

Feb 15 '24 16:02 francelico

Hi @francelico, thanks for the message. I turned it off because, in practice, it didn't seem to matter that much to the performance. As much as I'd love to have a DAAC agent in Cleanba, maybe not for now as this repo is mainly for distributed DRL stuff and kind of archived.

Feb 16 '24 02:02 vwxyzjn

cleanba cleanba copied to clipboard

Missing clipped value loss in PPO implementation

cleanba
cleanba copied to clipboard