Astraea Quinn S

Results 61 comments of Astraea Quinn S

That makes sense. It feels too restrictive though, doesn't it? It increases nesting by an additional level. I think we can close this as not to be implemented, but I'd...

I am not sure if this is the correct approach. In RND the critic network uses two Value heads to estimate the two reward streams so implementing it as a...

I agree with you, perhaps both streams could be available in the `info` dict? This will be quite useful wrt evaluating the performance and debugging the algorithm.

> You could find better answers in [the docs](https://stable-baselines.readthedocs.io/en/master/guide/rl_tips.html) or in [OpenAI SpinningUp](https://spinningup.openai.com/en/latest/). Note, the spinning up implementation uses the approximate kl divergence as an early stop mechanism. This isn't...

It does make sense when you are implementing hierarchical agents. To be exact, I am using many individual agents in a structure like 'feudal learning' and I need a way...

Indeed, DQN learns every `n_step`, however, it compares against `self.num_timesteps` so it does work. ```python if can_sample and self.num_timesteps > self.learning_starts \ and self.num_timesteps % self.train_freq == 0: ``` The...

I agree with you. Perhaps an extra argument in `learn`? For example, `explore_over_timesteps` that is by default set to None. If it is None it takes `exploration_fraction*total_timesteps`, keeping the default...

Thanks @araffin, Due to some other issues that I encountered, I will derive the classes. For future reference and anyone else that might encounter this, the callback doesn't have access...

@rustbot label +beta-nominated I think this should be merged upstream as well.