Antonin RAFFIN
Antonin RAFFIN
> I suggest to add argument for user to specify the number of retries sounds good, please go ahead with a PR ;) (don't forget to take a look at...
Let's wait for it to be non experimental. In the meantime, you can always use that feature if you have a fork of the RL Zoo ;)
>My evaluation metric is the mean reward of 100 episodes. You should do an evaluation on a test env after training, using `deterministic=True` (especially for SAC and DDPG). >PPO2 is...
> I would not have enough time to run even 100 trials Usually, you don't run hyperparameter tuning on the full budget. You can try on one quarter of it,...
>Thus, I think the low performance for off-policy methods might not relate to that setting. The predict method is only used for testing, for training, all policy are stochastics (don't...
HER is only in the master branch for now. It will be released soon (that id why the docker does not work yet), so you need to install SB from...
Hello, thanks for the PR, I will try to have a look this week if I have time. In the meantime, please add user attribute with the correct values (makes...
> Since I use SubProcEnv for parallel rollouts, it is not efficient to send the policy to env (reward function) frequently due to communication overhead. on my stack...
> Can I start with this? Can you provide some help regarding this yes please =). Best is to take a look at https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/rl_zoo3/plots/plot_train.py for the training plots. We could...
@theSquaredError are you still working on this?