Antonin RAFFIN comments

Results 880 comments of


                                            Antonin RAFFIN

[Question] HER applied on GoalEnv with ObservationWrapper

It is actually documented in #780 (and env checker is updated there). We should probably cherry-pick those changes.

[Question] Influence of evaluations on model learning

Hello, > From my understanding of the code and the documentation, i would answer the question with no yes, it should not. (and it does not for built-in gym env)...

[Question] Custom action space with PPO

Hello, > But that means that I have to adhere to sampling actions with a normal distribution (in the case of Box). I would like to test a different distribution...

[Question] Custom action space with PPO

> an half gaussian distribution? looks ok, I'm just wondering in which context you would need a half normal distribution?

[Feature Request] different activation functions in the network_architecture through the policy_kwargs

Hello, sounds reasonable (even though I doubt changing the activation per layer will make a big difference). Could you do a draft PR to see how much complexity it adds?...

[Feature Request] different activation functions in the network_architecture through the policy_kwargs

> have a final softmax layer in the actor network ( I see, in that case, there is a misunderstanding but this is already the case for PPO and discrete...

[Feature Request] different activation functions in the network_architecture through the policy_kwargs

> is used in all the other layers of both the policy net and the value net. Is that correct? yes

[Feature Request] Discounted Return value in policy evaluation

Hello, > However, it is sometimes more sensible to report the discounted return: Could you elaborate where/when you would like to do that and why? > This will give the...

[Feature Request] Discounted Return value in policy evaluation

PS: `callback` argument is used here in `EvalCallback`: https://github.com/DLR-RM/stable-baselines3/blob/d64bcb401ad7d45799af1feee5c1058943be23f0/stable_baselines3/common/callbacks.py#L401

[Feature Request] Discounted Return value in policy evaluation

> In most cases, this is what the algorithm is optimizing. It is useful to see the progress of training relative to the actual objective. Then maybe the right place...