stable-baselines3 icon indicating copy to clipboard operation
stable-baselines3 copied to clipboard

[Question] Pong environment with A2C not learning with example code

Open Tanis1304 opened this issue 1 year ago • 1 comments

❓ Question

I copied the code from the Examples section in the documentation, which also uses a PongNoFrameskip-v4 environment with 4 stacked frames. The episodic mean reward starts out around -20, but then worsens, after which it fluctuates between -21 and -20.5. I use the default hyperparameters of the A2C CNN policy, as you can also tell from the code below.

vec_env = make_atari_env("PongNoFrameskip-v4", n_envs=4, seed=0)
vec_env = VecFrameStack(vec_env, n_stack=4)

model = A2C("CnnPolicy", vec_env, verbose=1, device='cuda')
model.learn(total_timesteps=10000000)

I'm running this code using Python 3.10.4 and torch 2.3.0. What could be going wrong here, and shouldn't this example code just work?

Checklist

Tanis1304 avatar May 03 '24 11:05 Tanis1304

if you want the correct hyperparameters for Atari, you should use the RL Zoo. The example in the doc is there to show the api, we kept it concise to focus on the wrappers we provide.

araffin avatar May 03 '24 12:05 araffin