Jiayi Weng comments

Results 303 comments of


                                            Jiayi Weng

Episode start signal not used in RNN for on-policy algorithms

> It seems that episode start signals are currently not used (hidden state must be reset when such signal is encountered) when working with RNN and on-policy algorithms. I think...

Episode start signal not used in RNN for on-policy algorithms

But in training phase, `state` is set to `None` https://github.com/thu-ml/tianshou/blob/3592f45446e6cc98423df2f1c28d8ca0ef2be821/tianshou/utils/net/common.py#L256-L269

Episode start signal not used in RNN for on-policy algorithms

But reset_state operation only applies to the corresponding env_id with done=True, like: ``` env_id 0 1 2 done T F T state 0 / 0 ```

Episode start signal not used in RNN for on-policy algorithms

Do you mean in the sample trajectory, there would be `done=True` in the middle? Actually this is never happening in the current implementation: only the last timestep may be `done=True`...

Episode start signal not used in RNN for on-policy algorithms

> when 0012 is fed to the agent, state 0 is fed twice? yes... > This seems unusual to me and does not match the standard implementation… I'll do the...

ReplayBuffer.update does not change stats while adding data

Just a legacy. It's because before version 0.4, tianshou uses the following pipeline to collect data: ``` [main_buffer] + [list of cached buffer for storing episode] once an episode (say,...

Mask action for policy gradient

It's better to modify the forward function in PGPolicy. I'm not sure how exactly to change the code...I'll figure it out later.

Mask action for policy gradient

In `policy/modelfree/pg.py`: ```diff logits, hidden = self.actor(batch.obs, state=state) if isinstance(logits, tuple): + # this is for (mu, sigma) from Normal distribution dist = self.dist_fn(*logits) else: # categorical distribution + #...

RNN for continuous CQL algorithm

@thkkk

RNN for continuous CQL algorithm

I think we can still use this setting by setting buffer's stack_num to >1. In short, when training RNN+CQL, we use [bsz, len, dim] to train a Recurrent network with...