galvanise_zero Curious about the exactly the input and output of the model for Connect6

Firstly, I want to say "Good work".

I'm looking for the exactly input and output of the model for Connect6.

Each move of the Connect6 consists of two decisions: place the first piece and second piece. How do you handle the large action space for a move? Do the model output the two actions (a move) at the same time? or other ways?

I have read nn/model.py and defs/gamedesc.py , but no where to find the model settings for Connect6 and I still confuse about the exactly input and output (policy part).

Could you give me some advices about my questions? or point out which part should I dive into.

Sep 26 '18 05:09 Sharknevercries

Firstly, I want to say "Good work". Thanks!

That's a good question. The inputs and outputs are derived from the GDL description of the game: https://github.com/richemslie/gzero_games/blob/master/rulesheets/connect6.kif For a good introduction to GDL - see this posting: http://alloyggp.blogspot.com/2012/12/the-game-description-language.html

In ggplib (which ggpzero is based on), the bases and actions are grounded as part of the propnet construction. ggplib stores these in a statemachine model of the game, where the statemachine has the logic for getting the legals for a move at each turn and for advancing state. The bases are a set of possible binary propositions for the state of the game. A set of the propositions being true would define a state of the game. There are set of legal actions for each role (player) at each move in a match. An important point is that if the game is turn based, the legal for the passive role will be a 'noop' action.

The policy for the NN is then just the actions from the statemachine model. There will be one policy per role (only tested on games with len(roles) == 2).

The bases for the statemachine model are mapped to planes as inputs to the network, with the high level encoding defined in defs/gamedesc.py and the mapping code is here nn/bases.py.

In Connect 6 case, the available actions are "(place x y)" and "noop" [1<=x<=19 and 1<=y<=19]. There are 362 actions for each role. When searching in MCTS the legals will be a subset of the actions available, and two separate (places) will be searched for over 2-ply of search.

So an example of opening moves might look like:

[(noop, (place 11 11)),
 ((noop, (place 9 9)),
 ((place 10 11), noop)), 
 ((place 10 8), noop).........

To make this more concrete, one can examine the statemachine model for a game by:

from ggplib.db import lookup 
game_info = lookup.by_name("connect6")
print game_info.model.bases
print len(game_info.model.actions)  # 2
print game_info.model.actions[0]

If you'd like to see the statemachine in action, see https://github.com/richemslie/ggplib/blob/master/src/ggplib/scripts/perf_test.py or in c++ https://github.com/richemslie/ggplib/blob/master/src/cpp/perf_test.cpp

Sep 26 '18 08:09 richemslie

Just to make sure I'm not mistaken; looking at https://github.com/richemslie/galvanise_zero/blob/dev/src/ggpzero/defs/gamedesc.py, would it be correct to say that correctly setting up the input for the NN architecture still requires manual work and domain knowledge? In other words, if you get a brand new GDL game, you'll have to manually specify which channels you want and how they are populated?

On the other hand, I think the architecture of the policy head seems to be automatically determined? In other words, given a GDL game description, the program can automatically figure out an upper bound on the number of distinct legal actions (such that every legal move always has a unique index in the policy head)?

Oct 21 '20 12:10 DennisSoemers

galvanise_zero galvanise_zero copied to clipboard

Curious about the exactly the input and output of the model for Connect6

galvanise_zero
galvanise_zero copied to clipboard