Not sample efficient enough
From Figure 6 in the paper, their A3C only needs 20 epochs (20 million steps) to achieve average scores of around 400 at Breakout. My current implementation needs more.

Following the authors' feedback, now it's only slightly worse than theirs.
@muupan Thank you for sharing implementation and setting with great result!
Your wiki helps a lot, and I'm going to try your setting.
Let me ask you something not written in wiki.
- There is loss normalization code for when sequence terminated at the middle
https://github.com/muupan/async-rl/blob/master/a3c.py#L113-L118
Are you using this now?
- There is an action skipping code at ALE # initialize()
https://github.com/muupan/async-rl/blob/master/ale.py#L146-L149
What is this for?
And I'm going to adjust my parameter as written in your wiki. Thanks!!
- No, I don't use it now.
- It is called "no-op max" in the Nature DQN paper. It adds some randomness to initial states.
I see. Thank you!