haneen

Results 2 issues of haneen

Dear Ecoffet, As an MSc student, I am currently working on implementing the explore method in the MDPO algorithm, as described in your paper titled "Mirror Descent Policy Optimization" (https://arxiv.org/pdf/2005.09814.pdf)....

When executing the following code: python from stable_baselines3.common.env_util import make_vec_env, make_atari_env from stable_baselines3.common.vec_env import VecFrameStack envs = make_atari_env("MontezumaRevenge-v4") envs = VecFrameStack(envs, n_stack=4) print(envs.observation_space, envs.action_space) the output is: Box(0, 255, (84,...