Astraea Quinn S comments

Results 61 comments of


                                            Astraea Quinn S

Single branch ifs in the body of a match arm should be made explicit

That makes sense. It feels too restrictive though, doesn't it? It increases nesting by an additional level. I think we can close this as not to be implemented, but I'd...

[Feature Proposal] Intrinsic Reward VecEnvWrapper

I am not sure if this is the correct approach. In RND the critic network uses two Value heads to estimate the two reward streams so implementing it as a...

[Feature Proposal] Intrinsic Reward VecEnvWrapper

I agree with you, perhaps both streams could be available in the `info` dict? This will be quite useful wrt evaluating the performance and debugging the algorithm.

PPO2 episode reward drops catastrophically during training

> You could find better answers in [the docs](https://stable-baselines.readthedocs.io/en/master/guide/rl_tips.html) or in [OpenAI SpinningUp](https://spinningup.openai.com/en/latest/). Note, the spinning up implementation uses the approximate kl divergence as an early stop mechanism. This isn't...

[Bug] DQN Exploration divides by 0 when learn steps are small

It does make sense when you are implementing hierarchical agents. To be exact, I am using many individual agents in a structure like 'feudal learning' and I need a way...

[Bug] DQN Exploration divides by 0 when learn steps are small

Indeed, DQN learns every `n_step`, however, it compares against `self.num_timesteps` so it does work. ```python if can_sample and self.num_timesteps > self.learning_starts \ and self.num_timesteps % self.train_freq == 0: ``` The...

[Bug] DQN Exploration divides by 0 when learn steps are small

I agree with you. Perhaps an extra argument in `learn`? For example, `explore_over_timesteps` that is by default set to None. If it is None it takes `exploration_fraction*total_timesteps`, keeping the default...

[Bug] DQN Exploration divides by 0 when learn steps are small

Thanks @araffin, Due to some other issues that I encountered, I will derive the classes. For future reference and anyone else that might encounter this, the callback doesn't have access...

[Bug] DQN Exploration divides by 0 when learn steps are small

It doesn't. I opened an issue on this.

Fix FP in `threadlocal!` when falling back to `os_local`

@rustbot label +beta-nominated I think this should be merged upstream as well.