rl-baselines3-zoo
rl-baselines3-zoo copied to clipboard
[Question] hyperparameter optimization: objective of optuna study
❓ Question
Hi,
I’ve been adapting your code for PPO hyperparameter optimization for my custom environment and I have a question regarding the evaluation metric used.
In exp_manager.py, on line 810, I noticed that the optimization objective is defined using:
reward = eval_callback.last_mean_reward
This means that only the last evaluation is used to determine if the current trial is the best one. I was wondering if there’s a specific reason for this approach. Would you consider using: 'reward = eval_callback.best_mean_reward' instead?
Checklist
- [X] I have checked that there is no similar issue in the repo
- [X] I have read the SB3 documentation
- [X] I have read the RL Zoo documentation
- [X] If code there is, it is minimal and working
- [X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.