rl-baselines3-zoo [Question] Results vastly different for an agent created with Stable Baselines3 using hyperparameters optimized in RL Baselines3 Zoo.

[Question] Results vastly different for an agent created with Stable Baselines3 using hyperparameters optimized in RL Baselines3 Zoo.

Open mzelazko opened this issue 8 months ago • 1 comments

❓ Question

Hello, I first optimize A2C on 1mln steps using RL Baselines3 Zoo:

Firstly i have changed a2c.yml in RL Baselines3 Zoo to work with RAM version of Seaquest:

atari:
  policy: 'MlpPolicy'
  n_envs: 16
  policy_kwargs: "dict(optimizer_class=RMSpropTFLike, optimizer_kwargs=dict(eps=1e-5))"

Then wrote command:

python -m train --algo a2c --env ALE/Seaquest-ram-v5 -n 1000000 -optimize --n-trials 100 --n-startup-trials 10
--sampler tpe --pruner median  --n-evaluations 4 --n-eval-envs 16 --storage "some_valid_database" --study-name test

Top 3 results: mysqlsh_ugrHTRYZRL Then using for example these hyperparameters: mysqlsh_4UsJxMM74z and using this code:

def linear_decay_lr(progress_remaining):
    return 0.00027232300584036946 * progress_remaining
if __name__ == "__main__":
    vec_env = make_vec_env("ALE/Seaquest-ram-v5", n_envs=16)
    model = A2C(
        "MlpPolicy",
        vec_env,
        learning_rate=linear_decay_lr,
        n_steps=256,
        gamma=0.999,
        gae_lambda=0.98,
        ent_coef=0.00001753537605091099,
        vf_coef=0.19195701505334234,
        max_grad_norm=0.5,
        use_rms_prop=True,
        normalize_advantage=False,
        verbose=1,
        tensorboard_log="./seaquest/107",
        policy_kwargs=dict(activation_fn=torch.nn.Tanh, net_arch=dict(pi=[256, 256], vf=[256, 256]), ortho_init=True,
                                      optimizer_class=RMSpropTFLike, optimizer_kwargs=dict(eps=1e-5))
    )
    model.learn(total_timesteps=1000000, log_interval=1)

I get results: firefox_ETT9C6jsTI

As picture shows, result is long way from 456 that RL Baselines Zoo got to. I have used more hyperparameters, but scores are always much lower. What I'm aware of that can have impact on this issue is seed, as I didn't pick the same. Nevertheless I have tried many instances of A2C and the problem remains.

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the SB3 documentation
[X] I have read the RL Zoo documentation
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

May 31 '24 15:05 mzelazko

rl-baselines3-zoo rl-baselines3-zoo copied to clipboard

[Question] Results vastly different for an agent created with Stable Baselines3 using hyperparameters optimized in RL Baselines3 Zoo.

❓ Question

Checklist

rl-baselines3-zoo
rl-baselines3-zoo copied to clipboard