open_spiel icon indicating copy to clipboard operation
open_spiel copied to clipboard

Simplest 2 player perfect information game?

Open sullins2 opened this issue 2 years ago • 4 comments

As the title says, I'm wondering which game is the simplest (as in smallest, fastest to train on) for 2 players and perfect information. It looks like it should be Pathfinding but I am unsure how large the grid is and whether I can reduce the rows/columns. I'm not sure if multiagent Pathfinding is zero-sum or cooperative but that matters less.

sullins2 avatar Oct 14 '22 01:10 sullins2

Hi, pathfinding seems to be a good one. You can actually specify kExampleMultiAgentGrid to customize the grid you want. This game is general-sum, e.g., see this paper

rezunli96 avatar Oct 14 '22 01:10 rezunli96

Tic-Tac-Toe is my go-to example in this space. There are other games parameterizable by columns so you can adjust the size to very small (such breakthrough to maybe 4x4, hex to 3x3 or maybe even 2x2?).

You can also use game trees encoded in the gambit .efg format if you want to go really small (see games/efg_game), e.g. examples you'd find in a game theory text book.

lanctot avatar Oct 14 '22 11:10 lanctot

Thanks for the replies. I am going to work with tic-tac-toe for now, but may need to get even smaller in the future. Is there a section on here that explains the gambit format in more detail? I also had a couple of follow up questions.

  1. Here (https://github.com/deepmind/open_spiel/issues/486) was an issue I was getting but corrected but now I am getting NashConv values that grow (rapidly) to very large values which I don't think makes sense in tictactoe. Without getting into the details of the algorithm I am implementing, the NashConv values are as high as (t=1 nashconv=60, t=2 nashconv=190, t=3 nashconv=300, and continue to grow. Is this behavior indicative of any errors I might be making similar to in the other post?

  2. While I am posting, another problem I have is trying to print out all of the values in colab for arrays but they print out as [0.1, 0.01, 0.1, ..., 0.0, 0.02]. I have searched all over to force all of the values to be shown but no luck. I'm wondering if anyone knows how to do this.

sullins2 avatar Oct 16 '22 19:10 sullins2

Hi @sullins2,

The gambit EFG representation is documented here. There are some examples in games/efg.

  1. Yes that issue you're quoting was resolved once we switched to use State::HistoryString() as the cached value for a history. The NashConv in Tic-Tac-Toe has an upper-bound of 4, so if you're getting values above that, there's definitely something broken somewhere. It would unlikely be related to the other post, but that's an indication of a problem. As a quick test, could you measure the NashConv of the uniform policy in Tic-Tac-Toe? What value does that give? If it's in the correct range (between 0 and 4) then I suspect maybe our policy that you're sending to NashConv could be badly formed? Is it easy to submit a simple PR that reproduces the problem that we can look through? (Or maybe send me a colab by email or something?)

  2. I'm not sure how either, I usually just write a for loop. =)

Note: the NashConv functions were designed for imperfect information games and are not super well-tested in the perfect information games, so it is very possible that this has gotten broken and there's no test to catch it (but I would be surprised it that is true). The policy should be a mapping from information state strings for NashConv to work, which makes it (unnecessarily) huge in Tic-Tac-Toe. There's a more efficient BR for this case: TabularBestResponseMDP, which can allow state keys that are observations. But the default/standard one should still work.

lanctot avatar Oct 16 '22 22:10 lanctot

Thanks for the offers and the reply, @lanctot . I ended up getting both issues resolved. What happened was I was using the cfr example (the smaller one) and in that example the current policy is accumulated to get the average policy. This is sent directly to nash_conv() and so I assumed that it was normalizing internally. I actually didn't need the average and so I took that out and it worked fine. (I might have changed some other things too, I forget exactly what they all were). Nashconv also appears to be working for tictactoe in case anyone sees this in the future. Thanks for the help (@rezunli96 too).

sullins2 avatar Oct 29 '22 18:10 sullins2