baselines
baselines copied to clipboard
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
I am wondering about the normalization of the advantage function in PPO. Before training on a batch the mean of the advantage function is subtracted and it's divided by its...
I'm trying to run the following code and test PPO with Sonic the hedghehog, running it in parralel with SubProcVecEnv Unfortunately I run in the following error: ``` Traceback (most...
Dear @pzhokhov @matthiasplappert @christopherhesse et al., Thank you for providing an implementation of DDPG. However, I have been unable to get it to learn well on the standard MuJoCo environments...
When I try to run the colab example with the A2C algorithm on an Atari env, I get the following error: `--------------------------------------------------------------------------- ConnectionResetError Traceback (most recent call last) in ()...
fix links of benchmarks in README.md
I got an error, when trying to log tensorboard output in the TF2 branch: > set OPENAI_LOG_FORMAT=stdout,log,csv,tensorboard > python -m baselines.run --alg=ppo2 --env=CartPole-v0 --network=mlp --save_path model --log_path log/ --num_timesteps=30000 --nsteps=128...
Hello. Can you please explain why are you using the `mb_returns = mb_advs(GAE) + mb_values` as the returns to compute the critic loss ? Should not the value function approximately...