recently the algorithm has been moved to https://github.com/jjkke88/RL_toolbox

trpo

trust region policy optimitztion base on gym and tensorflow

There are three versions of trpo, one for decrete action space like mountaincar, one for decreate action space task with image as input like atari games, and the last for continuous action space for pendulems.

The environment is base on openAI gym.

part of code refer to rllab

dependency

tensorflow 0.10
prettytensor
latest openai gym

constructure for code

baseline:baseline estimation of baseline function $V_\pi$
checkpoint:folder to store model file, can not be delete or will cause some error
distribution:distribution base class, it can be used to calculate probability of distributions, for example Gaussian.
logger:have a Logger class for log data to .csv file
agent:for disperse action space and continous action space
log:store log file
experiment: contain many different main file, run main file can start trainning or testing
environment.py: environment
krylov.py: implement of some math method:conjugate gradient descent , calculating hessian matrix
parameters.py: config file
utils.py: implement of some basic function: getFlat, setFlat, lineaSearch

recent work

imple multi-thread trpo run python main_multi_thread.py to try
imple tensorflow distributed trpo
imple trpo multi-process

future work

complete trpo with image as input

trpo
trpo copied to clipboard

Metadata

recently the algorithm has been moved to https://github.com/jjkke88/RL_toolbox

trpo

dependency

constructure for code

recent work

future work

← Metadata

Owner

Metadata

trpo trpo copied to clipboard

Metadata

recently the algorithm has been moved to https://github.com/jjkke88/RL_toolbox

trpo

dependency

constructure for code

recent work

future work

← Metadata

Owner

Metadata

trpo
trpo copied to clipboard