Yasuhiro Fujita comments

Results 90 comments of


                                            Yasuhiro Fujita

Performance evaluation and comparison of algorithms

I agree TRPO is great, but supporting TRPO is off-topic on this issue. https://github.com/openai/rllab and https://github.com/openai/baselines are doing such evaluation and comparison really well, so it's good to start from...

Performance evaluation and comparison of algorithms

Here are DQN's scores on five Atari games https://github.com/muupan/chainerrl/blob/benchmark-dqn/evaluations/visualize.ipynb

Performance evaluation and comparison of algorithms

Added DoubleDQN and PAL.

Performance evaluation and comparison of algorithms

Added DQN with prioritized replay

Argument t_max in PCL misused

When `t_max` is set to None, PCL is expected to work as the original paper does, i.e. sampling a full episode. When `t_max` is set to non-None value, it is...

ACER fails to learn mujoco

> Also, initialized policy of ACER has fluctuated reward at each start (for example, first episode reward of Reacher-v1 fluctuates from -500 to -100), so is there some way to...

ACER fails to learn mujoco

Another possible cause is the scale of observation space. `examples/gym/train_ppo_gym.py` normalizes observations so that mean=0, std=1, while `examples/gym/train_acer_gym.py` doesn't.

Use chainer.Sequential instead of chainerrl.links.Sequence

Chainer v2 support has been dropped. `chainerrl.links.Sequence` is designed to work with an old stateful recurrent interface, so we need to switch to a new stateless recurrent interface #431 to...

MultiprocessVectorEnv blocked forever

I think I've not experienced it. Can you provide more details?

MultiprocessVectorEnv blocked forever

I have no idea how it can happen. Can you provide code that can reproduce this issue?