rl-baselines3-zoo [Question] hyperparameter optimization: objective of optuna study

[Question] hyperparameter optimization: objective of optuna study

Open bias-ster opened this issue 6 months ago • 0 comments

❓ Question

Hi,

I’ve been adapting your code for PPO hyperparameter optimization for my custom environment and I have a question regarding the evaluation metric used.

In exp_manager.py, on line 810, I noticed that the optimization objective is defined using: reward = eval_callback.last_mean_reward

This means that only the last evaluation is used to determine if the current trial is the best one. I was wondering if there’s a specific reason for this approach. Would you consider using: 'reward = eval_callback.best_mean_reward' instead?

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the SB3 documentation
[X] I have read the RL Zoo documentation
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

Aug 23 '24 14:08 bias-ster

rl-baselines3-zoo rl-baselines3-zoo copied to clipboard

[Question] hyperparameter optimization: objective of optuna study

❓ Question

Checklist

rl-baselines3-zoo
rl-baselines3-zoo copied to clipboard