ReinforcementLearning.jl icon indicating copy to clipboard operation
ReinforcementLearning.jl copied to clipboard

Implement TRPO/ACER

Open jbrea opened this issue 7 years ago • 1 comments

See also John Schulman's python implementations

jbrea avatar Jun 07 '18 19:06 jbrea

finding it pretty difficult to implement TRPO. Zygote doesn't particularly like higher-order gradients, and Flux doesn't particularly like dealing with flat parameters, both of which are necessary for TRPO. i feel like i'm constantly bumping into the limits of what i'm expected to be doing here.

the original implementation (appendix C) of the Fisher information matrix with conjugate gradient method is probably more doable right now, but that (i think) can only deal with discrete action spaces.

if anyone is interested in helping i'll clean up code and start a draft PR :)

baedan avatar Jul 25 '22 09:07 baedan