reversi-alpha-zero About using different players for training game generation

So I have a question which is related to another similar project for go https://github.com/gcp/leela-zero In this project, self play game are generated from the same player playing against itself. So black and white have the same random seed, and have a shared search tree through tree reuse. If I'm reading the code right, in reversi-alpha-zero, 2 independent players are used to generate self-play games, with their own separate search tree and different random seed. I am very curious about the effects of the 2 different ways of doing this. What have been your results ?

Mar 01 '18 17:03 remdu

Hi @Eddh

I am very curious about the effects of the 2 different ways of doing this. What have been your results ?

Though I also enabled sharing tree search information by share_mtcs_info_in_self_play, I don't see the effects of sharing and separate tree search information. I feel that perfectly separating(between black and white in a game) is a little waste of computation cost, and sharing it among games brings a kind of overfitting or mode collapse.

If I have rich computational resources, it might be better to separate perfectly because it brings a little randomness.

Mar 04 '18 06:03 mokemokechicken

Thank you for the answer. I have been curious about this but maybe it has less of an effect than I expected. Did you do tests regarding reusing tree information and the effect it has on the effictiveness of Dirichlet noise ? In other related projects, the consensus seems to be that it does make the Dirichlet noise less effictive but that as long as it doesn't prevent new moves discovery completely the speed boost is worth the cost.

Mar 26 '18 16:03 remdu

Did you do tests regarding reusing tree information and the effect it has on the effictiveness of Dirichlet noise ?

I tested reusing tree information and checked the moves. In early phase of training, even if reusing it among several games, there were no (or very few) completely same move games. However in late phase, there were many same move games even if no reusing tree information.

Although it will be a little different topic...

there is a draw in reversi. I think if both black and white think "the best result of this position is draw", the game will tend to draw. Because they can only find "lose" and "draw" moves, they select known "draw" moves. They have little motivation to find new "win" moves. If there is no draw(like Go), they can find only "lose" or "win" moves, they select "win" moves they believe, so they(losing side) find new moves.

I was annoyed with this "many draw games(80~90%) problem". It was difficult to break this situation.

Reusing tree information tends to bring the problem. So I think it might be better to separate perfectly because it brings a little randomness.

Mar 27 '18 09:03 mokemokechicken

Maybe that is just the nature of Reversi game, see https://en.wikipedia.org/wiki/Computer_Othello, the "Othello 8 x 8" section. That being said, even if enough randomness is promised when traning, it will leads to a draw at last.

As said in that link:

Regarding the three main openings of diagonal, perpendicular and parallel, it appears that both diagonal and perpendicular openings lead to drawing lines, while the parallel opening is a win for black.

Is your model playing diagonal opening or perpendicular opening?

Mar 27 '18 10:03 gooooloo

Is your model playing diagonal opening or perpendicular opening?

Several opening including diagonal and perpendicular were played. If the model did best moves, there is no problem, however the model lost against NTest 9~.

Mar 27 '18 22:03 mokemokechicken

I see. Looking forward to a solution being found~

Mar 28 '18 02:03 gooooloo

reversi-alpha-zero reversi-alpha-zero copied to clipboard

About using different players for training game generation

reversi-alpha-zero
reversi-alpha-zero copied to clipboard