ReinforcementLearning.jl TRPO

TRPO

Open baedan opened this issue 3 years ago • 1 comments

this PR implements Trust-Region Policy Optimization, and adds a CartPole experiment for it.

to this end, i wrote a few utility functions that are shared amongst policy gradient policies (#737). but perhaps a better way to go about it is to have a PolicyGradientPolicy type, and have it wrap different learners.

Aug 08 '22 05:08 baedan

Looks fine to me in general. I think there's still room for improvement in the gradient part. I'll add more detailed comments this weekend.

Aug 10 '22 06:08 findmyway

I'll merge this first. I may find some time in the next week to polish this further ;)

Sep 11 '22 04:09 findmyway

ReinforcementLearning.jl ReinforcementLearning.jl copied to clipboard

TRPO

ReinforcementLearning.jl
ReinforcementLearning.jl copied to clipboard