Astraea Quinn S
Astraea Quinn S
That makes sense. It feels too restrictive though, doesn't it? It increases nesting by an additional level. I think we can close this as not to be implemented, but I'd...
I am not sure if this is the correct approach. In RND the critic network uses two Value heads to estimate the two reward streams so implementing it as a...
I agree with you, perhaps both streams could be available in the `info` dict? This will be quite useful wrt evaluating the performance and debugging the algorithm.
> You could find better answers in [the docs](https://stable-baselines.readthedocs.io/en/master/guide/rl_tips.html) or in [OpenAI SpinningUp](https://spinningup.openai.com/en/latest/). Note, the spinning up implementation uses the approximate kl divergence as an early stop mechanism. This isn't...
It does make sense when you are implementing hierarchical agents. To be exact, I am using many individual agents in a structure like 'feudal learning' and I need a way...
Indeed, DQN learns every `n_step`, however, it compares against `self.num_timesteps` so it does work. ```python if can_sample and self.num_timesteps > self.learning_starts \ and self.num_timesteps % self.train_freq == 0: ``` The...
I agree with you. Perhaps an extra argument in `learn`? For example, `explore_over_timesteps` that is by default set to None. If it is None it takes `exploration_fraction*total_timesteps`, keeping the default...
Thanks @araffin, Due to some other issues that I encountered, I will derive the classes. For future reference and anyone else that might encounter this, the callback doesn't have access...
It doesn't. I opened an issue on this.
@rustbot label +beta-nominated I think this should be merged upstream as well.