Yasuhiro Fujita

Results 90 comments of Yasuhiro Fujita

It didn't occurred for Space Invaders. For Breakout we might need to force long episodes to finish.

Following the authors' feedback, now it's only slightly worse than theirs.

1) No, I don't use it now. 3) It is called "no-op max" in the Nature DQN paper. It adds some randomness to initial states.

Sorry for the problems. I haven't run my code for a while and don't know whether it works with the latest Caffe and ALE. @watts4speed nice work!

Good point. It is difficult to guarantee that we won't compute gradient wrt `mean` or `var` of `GaussianDistribution`, so maybe we should not use `chainer.as_variable` here...

Some facts - ~`ACER` computes gradient wrt Distribution.params via `backward`, not `chainer.grad`, so it can be affected by `requires_grad=False`.~ Now ACER users `chainer.grad` #511 - Currently `chainer.grad` can compute gradient...

Related PRs: https://github.com/chainer/chainerrl/pull/149 https://github.com/chainer/chainerrl/pull/295

Now the ICLR 2018 version of the Reactor paper has a lot of updates and new impressive results, replicating it is more important than before.

DQN is usually applied to a discrete action space. If you want to tackle a continuous vector-valued action space, I recommend using other algorithms like the ones under https://github.com/chainer/chainerrl/tree/master/examples/mujoco/reproduction.

Hm, very strange. I cannot reproduce it on my Ubuntu 16.04 machine with CUDA 9.1. ``` $ ipython Python 3.5.2 (default, Nov 23 2017, 16:37:01) Type 'copyright', 'credits' or 'license'...