ReinforcementLearning.jl
ReinforcementLearning.jl copied to clipboard
Gain in VPGPolicy does not account for terminal states?
I think we need to pass traj[:terminal] to discount_rewards so that the gain is computed only up to termination of an episode?
https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/6fe6aa01208c325f8f990032621c18b61d574b37/src/ReinforcementLearningZoo/src/algorithms/policy_gradient/vpg.jl#L105
Agree, nice catch!
How about making a PR?
I'm happy to do it. I've never done it before. Could you guide me through the process? I guess I have to fork the package, make the changes on my local copy in a separate branch?
Yep, you may find many resources by searching "first github pr" (like this one )