Antonin RAFFIN

Results 880 comments of Antonin RAFFIN

It is actually documented in #780 (and env checker is updated there). We should probably cherry-pick those changes.

Hello, > From my understanding of the code and the documentation, i would answer the question with no yes, it should not. (and it does not for built-in gym env)...

Hello, > But that means that I have to adhere to sampling actions with a normal distribution (in the case of Box). I would like to test a different distribution...

> an half gaussian distribution? looks ok, I'm just wondering in which context you would need a half normal distribution?

Hello, sounds reasonable (even though I doubt changing the activation per layer will make a big difference). Could you do a draft PR to see how much complexity it adds?...

> have a final softmax layer in the actor network ( I see, in that case, there is a misunderstanding but this is already the case for PPO and discrete...

> is used in all the other layers of both the policy net and the value net. Is that correct? yes

Hello, > However, it is sometimes more sensible to report the discounted return: Could you elaborate where/when you would like to do that and why? > This will give the...

PS: `callback` argument is used here in `EvalCallback`: https://github.com/DLR-RM/stable-baselines3/blob/d64bcb401ad7d45799af1feee5c1058943be23f0/stable_baselines3/common/callbacks.py#L401

> In most cases, this is what the algorithm is optimizing. It is useful to see the progress of training relative to the actual objective. Then maybe the right place...