trpo
trpo copied to clipboard
trust region policy optimization base on gym and tensorflow, can run in distribution mode
recently the algorithm has been moved to https://github.com/jjkke88/RL_toolbox
trpo
trust region policy optimitztion base on gym and tensorflow
There are three versions of trpo, one for decrete action space like mountaincar, one for decreate action space task with image as input like atari games, and the last for continuous action space for pendulems.
The environment is base on openAI gym.
part of code refer to rllab
dependency
- tensorflow 0.10
- prettytensor
- latest openai gym
constructure for code
- baseline:baseline estimation of baseline function
- checkpoint:folder to store model file, can not be delete or will cause some error
- distribution:distribution base class, it can be used to calculate probability of distributions, for example Gaussian.
- logger:have a Logger class for log data to .csv file
- agent:for disperse action space and continous action space
- log:store log file
- experiment: contain many different main file, run main file can start trainning or testing
- environment.py: environment
- krylov.py: implement of some math method:conjugate gradient descent , calculating hessian matrix
- parameters.py: config file
- utils.py: implement of some basic function: getFlat, setFlat, lineaSearch
recent work
- imple multi-thread trpo run python main_multi_thread.py to try
- imple tensorflow distributed trpo
- imple trpo multi-process
future work
- complete trpo with image as input