Add Nonogram puzzle environment
Background
Nonogram is a logic puzzle. You are given a grid you need to color in black and white and "hints" - which indicate the run lengths.
For example, a clue of "4 8 3" would mean there are sets of four, eight, and three filled squares, in that order, with at least one blank square between successive sets.
This game is hard mathematically (it is "NP-HARD" meaning that we don't expect to find an algorithm that solves it completely but we have good heuristics)
The wikipedia article has a lot of nice techniques.
Environment
There are two types of actions coloring white and coloring black. I also experimented with allow toggle on and off, but I found the agent getting stuck in loops and I wanted to force it to see new boards).
I implemented two modes of the environment
One I call easy_learn (in practice it is hard to play for humans) in which a wrong move leads to an instant death. There are some ambiguity for randomly generated boards but I think the signal is strong regardless.
The other one has a couple of heuristics to check if the board is correct / incorrect
- if a play places a black square in a row without hints it's incorrect
- if a row has 10 black squares and the total of clues is 10 at the player colors another black it's incorrect
- If a player completes a row (as in the total equals the total of the clues) and the pattern doesn't match it's incorrect
- If for example a clue is "4 5 8" and color a black square creates 9 consecutive black squares it's incorrect.
This mode is harder to learn in practice.
I also give negative rewards for invalid moves and positive rewards if the player completes a row or a column or completes the board (a larger reward for completing the board).
Neural network
I tried many different kinds of architectures. In the end, I decided on encoding the board using a CNN the clues using a dense architecture separately (rows and columns) and also encoding the board size.
The grid has 4 types of cells (padding, empty, black, white) and clues have theoretically 0-8 (so 9 types).
In practice I statically put the maximum board size at 8.
Curriculum Learning
I do two types of randomization
- random board size
- I sample each black or white based on a probability p (sampled uniformly from 0 to 1)
I ran my best policy against different board sizes It can beat all sizes up until 5x5, and 70% of 6x6 boards.