Jiayi Weng

Results 303 comments of Jiayi Weng

> It seems that episode start signals are currently not used (hidden state must be reset when such signal is encountered) when working with RNN and on-policy algorithms. I think...

But in training phase, `state` is set to `None` https://github.com/thu-ml/tianshou/blob/3592f45446e6cc98423df2f1c28d8ca0ef2be821/tianshou/utils/net/common.py#L256-L269

But reset_state operation only applies to the corresponding env_id with done=True, like: ``` env_id 0 1 2 done T F T state 0 / 0 ```

Do you mean in the sample trajectory, there would be `done=True` in the middle? Actually this is never happening in the current implementation: only the last timestep may be `done=True`...

> when 0012 is fed to the agent, state 0 is fed twice? yes... > This seems unusual to me and does not match the standard implementation… I'll do the...

Just a legacy. It's because before version 0.4, tianshou uses the following pipeline to collect data: ``` [main_buffer] + [list of cached buffer for storing episode] once an episode (say,...

It's better to modify the forward function in PGPolicy. I'm not sure how exactly to change the code...I'll figure it out later.

In `policy/modelfree/pg.py`: ```diff logits, hidden = self.actor(batch.obs, state=state) if isinstance(logits, tuple): + # this is for (mu, sigma) from Normal distribution dist = self.dist_fn(*logits) else: # categorical distribution + #...

I think we can still use this setting by setting buffer's stack_num to >1. In short, when training RNN+CQL, we use [bsz, len, dim] to train a Recurrent network with...