Antonin RAFFIN comments

Results 880 comments of


                                            Antonin RAFFIN

[Feature Request] MaskableRecurrentPPO

Duplicate of https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/issues/76 @dylanprins I would happy to share the link in the doc if you could open source your implementation ;)

a question regarding RecurrentPPO

> Now that the lstm_states come directly from the buffer, rather than being computed from the start, doesn't that mean that the backpropagation though time procedure merely goes back for...

a question regarding RecurrentPPO

> But I'm still a little confused, because from my perspective, the sampled obs should be of the shape (batch_size, history_length, obs_dim), Actually no, the main reason is that you...

[question] All checkpoints use the same VecNormalize statistics

Hello, good point, i did that mainly to save space, but you are right, we should give the ability to save each checkpoint stats too. The other thing is that...

[question] All checkpoints use the same VecNormalize statistics

> Saving each checkpoint stats may cost space and, in general, is not necessary. Probably, It is reasonable and practical to save the stats of best_model.zip and {env_id}.zip. I would...

`ExperimentManager.objective()` should return `best_mean_reward` not `last_mean_reward`

Hello, it depends on what you want. For instance, if you return `best_mean_reward`, it will favor trials that reaches high values but are not necessary stable: ![gh_issue](https://user-images.githubusercontent.com/1973948/190606706-b76eef7a-c3a8-4f8a-95d0-8ec0a3e72b2f.png) Returning `best_mean_reward` will...

Antonin RAFFIN

[Feature Request] MaskableRecurrentPPO

a question regarding RecurrentPPO

a question regarding RecurrentPPO

[question] All checkpoints use the same VecNormalize statistics

[question] All checkpoints use the same VecNormalize statistics

`ExperimentManager.objective()` should return `best_mean_reward` not `last_mean_reward`

`ExperimentManager.objective()` should return `best_mean_reward` not `last_mean_reward`

`ExperimentManager.objective()` should return `best_mean_reward` not `last_mean_reward`

[question] cannot recover the optimal reward from saved best_model.zip as the tensorboard reported

[question] cannot recover the optimal reward from saved best_model.zip as the tensorboard reported