rl_algorithms
rl_algorithms copied to clipboard
Structural implementation of RL key algorithms
Add examples to use custom environment, custom agent and custom Learner. Fix minor issue (torch version in requirements.txt and add interim test for ACER.)
Changes - Fix to update ACER with log scale. - Select action now return only action. - Add interim test for ACER. - Support CNN based algorithms now.
Add grouping depends on category when log by wandb. - Agent - integration test
Hello, Thank you for this fantastic repo! I was hoping to use some of the implementations here for the [MineRL](https://minerl.io/docs/index.html) competition. However, it seems policy-based algorithms like [``A2C,``](https://github.com/medipixel/rl_algorithms/issues/238)``PPO, SAC, and...
 
I think this kind of flag name is better than previous flag. - `--load-from` -> `--ckpt-path` If you have other ideas, please leave comments below.
A2C algorithm is implemented for continuous environment like Lunarlander-continuous now. We should implement A2C for discrete environment because its performance can be better in discrete env.
Although rainbowIQN's pong benchmark is very impressive It's difficult to reproduce the performance presented in the paper in other atari environments based on pong_configs.
Although current training speed is not bad it takes almost a month(2,685,280s) to train IQN for 200M frames in atari environment, So if separating training part and worker parallelly as...
We should change Mujoco env to [Pybullet-gym](https://github.com/benelot/pybullet-gym) env because Mujoco license is expired. Pybullet-gym has a lot of continuous action environments include reacher, half-cheetah.