Richard

Results 1 comments of Richard

> In the [line used to define the returns](https://github.com/philtabor/Youtube-Code-Repository/blob/1ef76059bf55f7df9ccc09fce0e0bfb7c13e89bd/ReinforcementLearning/PolicyGradient/PPO/torch/ppo_torch.py#L186), we use the GAE + values as the target for the critic to learn. Is this correct? > > My intuition...