rl-baselines3-zoo icon indicating copy to clipboard operation
rl-baselines3-zoo copied to clipboard

[Question] hyperparameter optimization: objective of optuna study

Open bias-ster opened this issue 6 months ago • 0 comments

❓ Question

Hi,

I’ve been adapting your code for PPO hyperparameter optimization for my custom environment and I have a question regarding the evaluation metric used.

In exp_manager.py, on line 810, I noticed that the optimization objective is defined using: reward = eval_callback.last_mean_reward

This means that only the last evaluation is used to determine if the current trial is the best one. I was wondering if there’s a specific reason for this approach. Would you consider using: 'reward = eval_callback.best_mean_reward' instead?

Checklist

bias-ster avatar Aug 23 '24 14:08 bias-ster