Yasuhiro Fujita comments

Results 90 comments of


                                            Yasuhiro Fujita

Some evaluation results are missing

It didn't occurred for Space Invaders. For Breakout we might need to force long episodes to finish.

Not sample efficient enough

Following the authors' feedback, now it's only slightly worse than theirs.

Not sample efficient enough

1) No, I don't use it now. 3) It is called "no-op max" in the Nature DQN paper. It adds some randomness to initial states.

eltwise_layer shape check failed

Sorry for the problems. I haven't run my code for a while and don't know whether it works with the latest Caffe and ALE. @watts4speed nice work!

[WIP] Use chainer.as_variable

Good point. It is difficult to guarantee that we won't compute gradient wrt `mean` or `var` of `GaussianDistribution`, so maybe we should not use `chainer.as_variable` here...

Some facts - ~`ACER` computes gradient wrt Distribution.params via `backward`, not `chainer.grad`, so it can be affected by `requires_grad=False`.~ Now ACER users `chainer.grad` #511 - Currently `chainer.grad` can compute gradient...

Synchronous parallel training

Related PRs: https://github.com/chainer/chainerrl/pull/149 https://github.com/chainer/chainerrl/pull/295

The Reactor: A Sample-Efficient Actor-Critic Architecture [1704.04651]

Now the ICLR 2018 version of the Reactor paper has a lot of updates and new impressive results, replicating it is more important than before.

Regarding the output of DQN

DQN is usually applied to a discrete action space. If you want to tackle a continuous vector-valued action space, I recommend using other algorithms like the ones under https://github.com/chainer/chainerrl/tree/master/examples/mujoco/reproduction.

Segmentation fault in Docker after importing chainerrl

Hm, very strange. I cannot reproduce it on my Ubuntu 16.04 machine with CUDA 9.1. ``` $ ipython Python 3.5.2 (default, Nov 23 2017, 16:37:01) Type 'copyright', 'credits' or 'license'...