Super-mario-bros-PPO-pytorch
Super-mario-bros-PPO-pytorch copied to clipboard
about train.py
Hello, look at your code, feel some questions. It's on line 114 of train.py, i.e. values = torch.cat(values).detach(). I think this statement should come after line 123. In line 120, i.e., gae = gae + reward + opt.gamma * next_value.detach() * (1 - done) - value.detach(). It will become a constant(size=1) instead of a vector(size=8). You can check it and solve my question. Thank you.
I think it should be as you