Implement Pop-Art

Open Kaixhin opened this issue 9 years ago • 0 comments

Learning functions across many orders of magnitudes introduces Preserving Outputs Precisely, while Adaptively Rescaling Targets (Pop-Art). In summary it normalises outputs across orders of magnitudes and gets rid of the clipping (i.e. counting) rewards heuristic for Atari games. The normalisation is also better for non-stationary problems, i.e., any decent real world problem.

The below is a picture of extra notes from the authors, next to their poster at NIPS 2016: img_20161207_190428

Apr 13 '16 08:04 Kaixhin