Jiayi Weng

Results 293 comments of Jiayi Weng

> note that int(np.prod(state_shape)) = len*dim I don't think so. `state_shape` should always be a single frame, i.e., `int(np.prod(state_shape)) = dim`. If it's not the case, you should modify it...

```python In [16]: m = nn.LSTM(input_size=3, hidden_size=10, num_layers=1, batch_first=True) In [17]: s = torch.zeros([64, 1, 3]) In [18]: ns, (h, c) = m(s) In [19]: ns.shape, h.shape, c.shape Out[19]: (torch.Size([64,...

Should be `dim`. Let's take atari example: the observation space is (4, 84, 84) where 4 is `len`. However, when defining recurrent network, the state_shape should be `84*84` instead of...

But here comes the problem: there are two ways to perform this kind of stack-obs: 1. gym.Env outputs single frame -- stack by buffer.sample(); 2. gym.Env outputs stacked frame by...

Did you only change the network structure instead of other hyperparameters like lr (those would be sensitive to reward curve)? Honestly speaking I haven't used RNN+SAC to run experiments :(

Maybe switch to something like: ```python observation_space: DictSpace(...) obs = {"player_a": np.array(...), "player_b": np.array(...)} # for all players # if only player_a, fill player_b with np.zeros_like(...) ```

Yep... Basically what the MAPM does is to split the whole observation into several folds and send it to each policy. Also for the action: concat at the end, then...

https://tianshou.readthedocs.io/en/master/tutorials/cheatsheet.html#rnn-style-training https://github.com/thu-ml/tianshou/issues/486#issuecomment-1002665193

Please see https://tianshou.readthedocs.io/en/master/tutorials/cheatsheet.html#rnn-style-training