chainerrl icon indicating copy to clipboard operation
chainerrl copied to clipboard

Add simpler examples

Open muupan opened this issue 8 years ago • 4 comments

Current examples have many options, including which env to solve. That might be helpful to tackle a new environment, where you will have to tune hyper parameters, but as examples to new users, I feel they are too complicated. I think it is better to add simpler examples that only solves a single predefined env.

muupan avatar Aug 01 '17 04:08 muupan

In particular, do you have a good set of parameters for using ACER in ALE?

I've been running your example train_acer_ale.py with the default options, but it only gives me reduced sample efficiency compared to A3C (while taking noticeably longer).

Could be I've just been using the wrong games. Wang, et. al. unhelpfully only give an aggregate graph which doesn't give any information about, in which specific environments ACER is supposed to have improved sample efficiency.

ElliotWay avatar Aug 06 '17 15:08 ElliotWay

@ElliotWay Interesting. Which game did you try? When I tuned train_acer_ale.py, I found it is much more sample-efficient than A3C on Breakout with the default parameters.

muupan avatar Aug 11 '17 22:08 muupan

@muupan Breakout, Beam Rider, Pong and Qbert. I guess if you did have better efficiency with Breakout, there must be something wrong with my setup - though I still get good results from A3C.

ElliotWay avatar Aug 14 '17 00:08 ElliotWay

@ElliotWay Thank you. It is possible there has been some regression in ChainerRL. It should be investigated.

muupan avatar Aug 14 '17 18:08 muupan