ReinforcementLearning.jl Gain in VPGPolicy does not account for terminal states?

Gain in VPGPolicy does not account for terminal states?

Open ArjunNarayanan opened this issue 3 years ago • 3 comments

I think we need to pass traj[:terminal] to discount_rewards so that the gain is computed only up to termination of an episode?

https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/6fe6aa01208c325f8f990032621c18b61d574b37/src/ReinforcementLearningZoo/src/algorithms/policy_gradient/vpg.jl#L105

Jan 26 '22 00:01 ArjunNarayanan

Agree, nice catch!

How about making a PR?

Jan 26 '22 04:01 findmyway

I'm happy to do it. I've never done it before. Could you guide me through the process? I guess I have to fork the package, make the changes on my local copy in a separate branch?

Jan 26 '22 04:01 ArjunNarayanan

Yep, you may find many resources by searching "first github pr" (like this one )

Jan 26 '22 04:01 findmyway

ReinforcementLearning.jl ReinforcementLearning.jl copied to clipboard

Gain in VPGPolicy does not account for terminal states?

ReinforcementLearning.jl
ReinforcementLearning.jl copied to clipboard