Yasuhiro Fujita
Yasuhiro Fujita
I agree TRPO is great, but supporting TRPO is off-topic on this issue. https://github.com/openai/rllab and https://github.com/openai/baselines are doing such evaluation and comparison really well, so it's good to start from...
Here are DQN's scores on five Atari games https://github.com/muupan/chainerrl/blob/benchmark-dqn/evaluations/visualize.ipynb
Added DoubleDQN and PAL.
Added DQN with prioritized replay
When `t_max` is set to None, PCL is expected to work as the original paper does, i.e. sampling a full episode. When `t_max` is set to non-None value, it is...
> Also, initialized policy of ACER has fluctuated reward at each start (for example, first episode reward of Reacher-v1 fluctuates from -500 to -100), so is there some way to...
Another possible cause is the scale of observation space. `examples/gym/train_ppo_gym.py` normalizes observations so that mean=0, std=1, while `examples/gym/train_acer_gym.py` doesn't.
Chainer v2 support has been dropped. `chainerrl.links.Sequence` is designed to work with an old stateful recurrent interface, so we need to switch to a new stateless recurrent interface #431 to...
I think I've not experienced it. Can you provide more details?
I have no idea how it can happen. Can you provide code that can reproduce this issue?