Atari
Atari copied to clipboard
Implement Pop-Art
Learning functions across many orders of magnitudes introduces Preserving Outputs Precisely, while Adaptively Rescaling Targets (Pop-Art). In summary it normalises outputs across orders of magnitudes and gets rid of the clipping (i.e. counting) rewards heuristic for Atari games. The normalisation is also better for non-stationary problems, i.e., any decent real world problem.
The below is a picture of extra notes from the authors, next to their poster at NIPS 2016:
