Yasuhiro Fujita

Results 90 comments of Yasuhiro Fujita

I agree TRPO is great, but supporting TRPO is off-topic on this issue. https://github.com/openai/rllab and https://github.com/openai/baselines are doing such evaluation and comparison really well, so it's good to start from...

Here are DQN's scores on five Atari games https://github.com/muupan/chainerrl/blob/benchmark-dqn/evaluations/visualize.ipynb

When `t_max` is set to None, PCL is expected to work as the original paper does, i.e. sampling a full episode. When `t_max` is set to non-None value, it is...

> Also, initialized policy of ACER has fluctuated reward at each start (for example, first episode reward of Reacher-v1 fluctuates from -500 to -100), so is there some way to...

Another possible cause is the scale of observation space. `examples/gym/train_ppo_gym.py` normalizes observations so that mean=0, std=1, while `examples/gym/train_acer_gym.py` doesn't.

Chainer v2 support has been dropped. `chainerrl.links.Sequence` is designed to work with an old stateful recurrent interface, so we need to switch to a new stateless recurrent interface #431 to...

I think I've not experienced it. Can you provide more details?

I have no idea how it can happen. Can you provide code that can reproduce this issue?