Antonin RAFFIN comments

Results 880 comments of


                                            Antonin RAFFIN

[Bug] An error in MaskPPO training

Good to hear =) then i would be happy to receive a PR that solves this issue ;)

[Bug] An error in MaskPPO training

Hello, > Could you elaborate why do you think, that removing the probs is not a good idea? The idea behind it is to use a feature that is in...

[feature request] Parameterized action spaces.

Hello, are you proposing to implement it or it is a request? If it is a request, please add a bit more motivation of why this algorithm is needed, otherwise...

[Feature Request] Better support for action masking for vectorized environments

Hello, this is a duplicate of https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/issues/49#issuecomment-957629253 we would apppreciate a PR that solves this issue ;)

Change from gamma=0.4 to default in Example Docu

> At the very least, there should be a warning about this as it is a very different value from what you usually use yes, I would be for a...

[Feature Request] Implement MBPO algorithm

Hello, I've got mixed feeling about that one. It would be interesting to have it, at least in a separate repo. However, the current focus of SB3 is model free...

[Feature Request] Implement MBPO algorithm

closing as out of scope of SB3 (we focus on model-free RL), however, if you want a model-free RL algorithm that is as sample efficient as MBPO, you can take...

ARS Returns 0 actions on evaluation env

Hello, > model we obtain valid actions but during evaluation we obtain actions of 0. Both the evaluation and train environment are the same except for using different but similar...

Custom network with image augmentation layer

Hello, Can you elaborate a bit more, what do you want to implement exactly? Where should it be included? Which algorithms do you plan to support?

Custom network with image augmentation layer

>The reason for applying the augmentation prior to the network instead of as a wrapper is to make best use of multiple pass-throughs of the data. e.g. instead of storing...