Reinforcement-Learning-Pytorch-Cartpole
Reinforcement-Learning-Pytorch-Cartpole copied to clipboard
Why compute action from target net rather than online net?
trafficstars
https://github.com/g6ling/Reinforcement-Learning-Pytorch-Cartpole/blob/ecb7b622cfefe825ac95388cceb6752413d90a2a/POMDP/4-R2D2-Single/train.py#L76
Another question : Why do you only store hidden state from target net and not from online net?