ReinforcementLearning.jl
ReinforcementLearning.jl copied to clipboard
Implement TRPO/ACER
finding it pretty difficult to implement TRPO. Zygote doesn't particularly like higher-order gradients, and Flux doesn't particularly like dealing with flat parameters, both of which are necessary for TRPO. i feel like i'm constantly bumping into the limits of what i'm expected to be doing here.
the original implementation (appendix C) of the Fisher information matrix with conjugate gradient method is probably more doable right now, but that (i think) can only deal with discrete action spaces.
if anyone is interested in helping i'll clean up code and start a draft PR :)