Antonin RAFFIN
Antonin RAFFIN
> I didn't know you could do that with callbacks. yes, callbacks & wrappers are quite powerful... > It would be interesting if it was documented. Well, I didn't have...
Hello, I'm actively using the callback, the important thing to check is comparing training with the same number of gradient updates. And also comparing how long does it take to...
Hello, Why not, but I would say you already have one: >`action = agent.act(state)` It is called `predict()`: `action, _ = agent.predict(state)` >`agent.record(state, action, next_state, reward, done, info)` As mentioned...
>Are these available in SB2 as well? Yes for the predict (cf doc). For the rest, more or less, it is a bit more messy. The `train()` corresponds to the...
Different updates regarding this issue: > Agreed, this question came up in sb repository quite often. Another related thing we could do is to make getting action probabilities/values bit easier...
Hello, > This goes against the recommendations of Revisiting the Arcade Learning Environment (https://arxiv.org/pdf/1709.06009.pdf). Yes, I'm aware of that. We kept it to be able to compare results against SB2....
> The current AtariWrapper by default has `terminate_on_life_loss` set to True. This goes against the recommendations of Revisiting the Arcade Learning Environment (https://arxiv.org/pdf/1709.06009.pdf). I believe this should be set to...
Related: https://github.com/hill-a/stable-baselines/issues/463 I need to think more about it, but for now I would prefer that users define custom policies and train methods (related to #55 though) rather than changing...
Hello, > Observation after env.reset() should be the same, i.d. Image 1 should be equal to Image 2 why? Calling `reset()` means starting a new episode, so you have the...