ReinforcementLearning.jl Implement TRPO/ACER

Implement TRPO/ACER

Open jbrea opened this issue 7 years ago • 1 comments

[ ] ACER
[ ] TRPO

See also John Schulman's python implementations

Jun 07 '18 19:06 jbrea

finding it pretty difficult to implement TRPO. Zygote doesn't particularly like higher-order gradients, and Flux doesn't particularly like dealing with flat parameters, both of which are necessary for TRPO. i feel like i'm constantly bumping into the limits of what i'm expected to be doing here.

the original implementation (appendix C) of the Fisher information matrix with conjugate gradient method is probably more doable right now, but that (i think) can only deal with discrete action spaces.

if anyone is interested in helping i'll clean up code and start a draft PR :)

Jul 25 '22 09:07 baedan

ReinforcementLearning.jl ReinforcementLearning.jl copied to clipboard

Implement TRPO/ACER

ReinforcementLearning.jl
ReinforcementLearning.jl copied to clipboard