Zhihan Yang comments

Results 9 comments of


                                            Zhihan Yang

Plotting Documentation

@araffin A single training run of an RL agent generates several monitor files (in my case, 3). Why is that? In other words, what does `stable_baselines3.common.monitor.load_results` do with them? I...

The effect of NormalizedEnv

There are several levels for my answer: - Strangely, I had to change the method names from `_action` and `_reverse_action` to `action` and `reverse_action` for the code to work -...

Why append additional (s a r) pair to the replay buffer after one episode is done?

I think this is weird, too. ```{python} agent.memory.append( observation, agent.select_action(observation), 0., False ) ``` Also, done is set to False is this tuple, which is more perplexing.

Why append additional (s a r) pair to the replay buffer after one episode is done?

Having said so, I think this would probably have a negligible effect in terms of learning, given that the replay buffer is so big, but I think it's good for...

the gradient of the action-value with respect to actions

I think the answer is yes. ```{python} policy_loss = -self.critic([ to_tensor(state_batch), self.actor(to_tensor(state_batch)) ]) policy_loss = policy_loss.mean() policy_loss.backward() self.actor_optim.step() ``` First of all, I think it is clear that we are...

Zhihan Yang

Plotting Documentation

The effect of NormalizedEnv

Why append additional (s a r) pair to the replay buffer after one episode is done?

Why append additional (s a r) pair to the replay buffer after one episode is done?

the gradient of the action-value with respect to actions

Understanding normalization of advantage function

What is categorical loss?

Shouldn't Input of Critic be hidden state of RNN?

Behavior Cloning in BRAC