rl-baselines3-zoo
rl-baselines3-zoo copied to clipboard
[Other] Optimized Hyperparameters for LunarLanderContinuous_v2 with PPO
I needed to do a run with PPO on a Gym environment on my cluster to make sure everything is working right before moving onto tuning PettingZoo environments, so I did combination that no one had done before:
python3 train.py --algo ppo --env LunarLanderContinuous-v2 -n 2000000 -optimize --n-trials 1000 --n-jobs 2 --sampler tpe --pruner median --study-name lunar_lander_1 --storage mysql://[redacted]
I ran that on 10 GPUs for a little less than 24 hours and got through about 170 trials before performance improvements stopped being meaningful and everything went to pruning. These were the best parameters (mind you a score of 200 counts as "solved" here).
[I 2021-04-15 15:13:02,318] Trial 134 finished with value: 307.1917026 and parameters: {'batch_size': 256, 'n_steps': 256, 'gamma': 0.995, 'lr': 0.000803803946053569, 'ent_coef': 3.2165680942085065e-07, 'clip_range': 0.2, 'n_epochs': 5, 'gae_lambda': 0.99, 'max_grad_norm': 2, 'vf_coef': 0.8682145978405473, 'net_arch': 'small', 'activation_fn': 'relu'}. Best is trial 134 with value: 307.192.
Hello, thanks =) I will give it a try in case the results can be reproduced with different seeds. (it also seem you used twice the budget of the pretrained agent available in the zoo: https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/benchmark.md)
I don't quite follow your statement "it also seem you used twice the budget of the pretrained agent". Which budget are you referring to?
Which budget are you referring to?
I meant the number of timesteps you are using to train the agent: 1M steps for the pretrained agent vs 2M according to your command line that you wrote above.