async-rl Not sample efficient enough

From Figure 6 in the paper, their A3C only needs 20 epochs (20 million steps) to achieve average scores of around 400 at Breakout. My current implementation needs more. 2016-05-08 18 10 18

May 08 '16 09:05 muupan

Following the authors' feedback, now it's only slightly worse than theirs.

May 10 '16 09:05 muupan

@muupan Thank you for sharing implementation and setting with great result!

Your wiki helps a lot, and I'm going to try your setting.

Let me ask you something not written in wiki.

There is loss normalization code for when sequence terminated at the middle

https://github.com/muupan/async-rl/blob/master/a3c.py#L113-L118

Are you using this now?

There is an action skipping code at ALE # initialize()

https://github.com/muupan/async-rl/blob/master/ale.py#L146-L149

What is this for?

And I'm going to adjust my parameter as written in your wiki. Thanks!!

May 10 '16 13:05 miyosuda

No, I don't use it now.
It is called "no-op max" in the Nature DQN paper. It adds some randomness to initial states.

May 10 '16 13:05 muupan

I see. Thank you!

May 10 '16 14:05 miyosuda