Antonin RAFFIN comments

Results 880 comments of


                                            Antonin RAFFIN

[bug] PPO2 episode reward summaries are written incorrectly for VecEnvs

>Thanks for the response. I am assuming the tensorboard logging issue for multiple-envs. has been resolved in SB3. For a more complete answer, you can use the "legacy" logging in...

[question] PPO2 pretrain always resets weights?

>I don't think I was involved with this code, or if I was I've forgotten about it ;) Yes, that was me. But @AdamGleave has a better repo for imitation...

GAIL throws error when obs space is MultiDiscrete

Hello, doc should be probably updated (but we having issues with travis now :/) Anyway, I would recommend you to use GAIL implementation from https://github.com/HumanCompatibleAI/imitation which is better maintained (and...

Upgrade to Tensorflow 2

There is an non official version with tf2 support here if you want : https://github.com/hill-a/stable-baselines/issues/984

Upgrade to Tensorflow 2

> TensorFlow 2 definitely needs to be supported. Nobody is going to use TF 1, if you have the opportunity to use TF 2. @nbro you should probably take a...

Upgrade to Tensorflow 2

>given that TF seems to be a lot more used than PyTorch, although PyTorch is also used in research and, apparently, most people voted for PyTorch in your poll. We...

Beta distribution as policy for environments with bounded continuous action spaces [feature request]

Hello, > Is it possible to add a beta distribution to the repository? This is not planned but we are opened to PR. Also, as for the huber loss (see...

Beta distribution as policy for environments with bounded continuous action spaces [feature request]

I would say you need some hyperparameter tuning... The parameters present in the current implementation were tuned for gaussian policies, so it is not completely fair to compare them without...

Beta distribution as policy for environments with bounded continuous action spaces [feature request]

The best practice would be to use hyperband or hyperopt to do it automatically (see https://github.com/araffin/robotics-rl-srl#hyperparameter-search). This [script](https://github.com/araffin/robotics-rl-srl/blob/master/rl_baselines/hyperparam_search.py) written by @hill-a can get you started. Otherwise, with PPO, the hyperparameters...

Beta distribution as policy for environments with bounded continuous action spaces [feature request]

@antoine-galataud before submitting a PR, please look at the contribution guide https://github.com/hill-a/stable-baselines/pull/148 (that would save time ;)) It will be merged with master soon.