Atari icon indicating copy to clipboard operation
Atari copied to clipboard

Exploration with pseudo counts

Open lake4790k opened this issue 8 years ago • 5 comments

New paper with method that performs well on Montezuma's revenge. Implementation could be used with both DDQN ER and async A3C. The probability used for the pseudo count is computed using Context Tree Switching that could be implemented based on this implementation.

lake4790k avatar Jun 08 '16 19:06 lake4790k

My first step is to implement a CTS based probability measure for small bitmaps (with 1 bit pixels) with the location dependent model described in the paper. I will expect reasonable probabilities calculated for patterns that have been processed (1), similar to those (>0.5) and dissimilar (0).

lake4790k avatar Jun 16 '16 14:06 lake4790k

Good luck! Finally got round to reading the paper and noticed some extras in the appendix. Seems like for completeness we'll need to add a stochastic ALE setting for this paper and the PAL paper, plus remove the terminal signal on life loss for this paper. Looks like that can make a huge difference on the results reported.

Kaixhin avatar Jun 16 '16 16:06 Kaixhin

FYI there's another (new) paper from DeepMind with similar goals...

Kaixhin avatar Jun 16 '16 16:06 Kaixhin

The paper refers to a number of other papers with regards to CTS usage saying "similar to this and that", but in the end the referred papers do quite different things, best to look at just the method in the pseudo count paper. They also refer to the Skipping CTS paper, but always talk about CTS, so I use the plain CTS for now.

Managed to adapt the CTS code to give reasonable probs for 1-bit pixel bitmaps with the neighbour factors in the paper. It's not described exactly how they handle the multiple bits of a single pixel, that could be done in a number of ways (for a single bit look at the same bit in the neighbouring pixels or look at all bits in the neighbouring pixels). I'll add different options for that and provide a native lib and an ffi interface that could be invoked in ER and async to compute the pseudo counts from the probabilities.

lake4790k avatar Jun 17 '16 15:06 lake4790k

Kind of finished a separate module with the native probability tree for 8 bit screens. Was not easy, but probably now comes the difficult part... for example the probablity of the screen is the product of the probability of the pixels. Different implementations (CTW and CTS) compute slightly different probabilities, but when there are 42 * 42 * 8 factors the probablity product can be quite different (ie. 0.99 vs 0.99999 ^ (42 * 42 * 8)...) Probably one would need to do exactly as DM to make it work... let's try anyway.

lake4790k avatar Jun 18 '16 14:06 lake4790k