chainerrl Add simpler examples

Current examples have many options, including which env to solve. That might be helpful to tackle a new environment, where you will have to tune hyper parameters, but as examples to new users, I feel they are too complicated. I think it is better to add simpler examples that only solves a single predefined env.

Aug 01 '17 04:08 muupan

In particular, do you have a good set of parameters for using ACER in ALE?

I've been running your example train_acer_ale.py with the default options, but it only gives me reduced sample efficiency compared to A3C (while taking noticeably longer).

Could be I've just been using the wrong games. Wang, et. al. unhelpfully only give an aggregate graph which doesn't give any information about, in which specific environments ACER is supposed to have improved sample efficiency.

Aug 06 '17 15:08 ElliotWay

@ElliotWay Interesting. Which game did you try? When I tuned train_acer_ale.py, I found it is much more sample-efficient than A3C on Breakout with the default parameters.

Aug 11 '17 22:08 muupan

@muupan Breakout, Beam Rider, Pong and Qbert. I guess if you did have better efficiency with Breakout, there must be something wrong with my setup - though I still get good results from A3C.

Aug 14 '17 00:08 ElliotWay

@ElliotWay Thank you. It is possible there has been some regression in ChainerRL. It should be investigated.

Aug 14 '17 18:08 muupan