dqn A couple of questions ...

A couple of questions ...

Open osh opened this issue 8 years ago • 1 comments

The default memory is set up as 50 episodes in your code, the deep-mind Atari paper uses 1,000,000 -- this seems like a major limiting issue when training -- can you achieve similar performance with 50 ?
The deep-mind paper runs for 100 x 50,000 episode epochs during training roughly, do you see roughly the same training time to proficiency on the same games here with similar scores?
Looking at a few new environments and value fn networks, I have experienced somewhat unstable average max value function outputs (growing undoubtedly or large fluctuation) -- I'm wondering if it makes sense to do some kind of action value target clipping or re-scaling - this seems to occur when the network is having difficulty differentiating next states from current states (lots of the inputs are the same or indistinguishable but a few are slightly different) -- is this kind of instability expected? what kind of training time is expected to achieve more stability here?

May 15 '16 03:05 osh

Re: number 1, note that 1,000,000 refers to frames, whereas 50 in this code refers to episodes. I am currently trying a 1,000 episode memory (among other hyperparameter changes) and will submit a pull request with those details if it works well. 1,000 episodes seemed to be in the ballpark of 1 million frames (60 fps * 1000 * 16 seconds or so per episode = close to a million, though the episode length in frames will increase over time if the performance improves).

May 16 '16 15:05 milesbrundage

dqn dqn copied to clipboard

A couple of questions ...

dqn
dqn copied to clipboard