Jiayi Weng
Jiayi Weng
> note that int(np.prod(state_shape)) = len*dim I don't think so. `state_shape` should always be a single frame, i.e., `int(np.prod(state_shape)) = dim`. If it's not the case, you should modify it...
```python In [16]: m = nn.LSTM(input_size=3, hidden_size=10, num_layers=1, batch_first=True) In [17]: s = torch.zeros([64, 1, 3]) In [18]: ns, (h, c) = m(s) In [19]: ns.shape, h.shape, c.shape Out[19]: (torch.Size([64,...
Should be `dim`. Let's take atari example: the observation space is (4, 84, 84) where 4 is `len`. However, when defining recurrent network, the state_shape should be `84*84` instead of...
But here comes the problem: there are two ways to perform this kind of stack-obs: 1. gym.Env outputs single frame -- stack by buffer.sample(); 2. gym.Env outputs stacked frame by...
Glad to hear that!
Did you only change the network structure instead of other hyperparameters like lr (those would be sensitive to reward curve)? Honestly speaking I haven't used RNN+SAC to run experiments :(
Maybe switch to something like: ```python observation_space: DictSpace(...) obs = {"player_a": np.array(...), "player_b": np.array(...)} # for all players # if only player_a, fill player_b with np.zeros_like(...) ```
Yep... Basically what the MAPM does is to split the whole observation into several folds and send it to each policy. Also for the action: concat at the end, then...
https://tianshou.readthedocs.io/en/master/tutorials/cheatsheet.html#rnn-style-training https://github.com/thu-ml/tianshou/issues/486#issuecomment-1002665193
Please see https://tianshou.readthedocs.io/en/master/tutorials/cheatsheet.html#rnn-style-training