minigo icon indicating copy to clipboard operation
minigo copied to clipboard

idea: Open all 8 symmetries at root to avoid bumpy policy

Open amj opened this issue 6 years ago • 3 comments

Once a move is played, we have to search it "for real". The policy priors for the move are the ones that come from whatever random rotation was used to open the node during search.

So, before beginning tree search, evaluate the root with all 8 symmetries and average the policy together.

This could be put behind a flag and directly tested for rating effect.

amj avatar Aug 22 '19 21:08 amj

This observation applies at all nodes, not just the root node. And the root node is the most likely to have its policy prior overridden, since it accumulates all of the subsequent evaluation data. I think it might make sense for the oscillating playouts, since there you have fewer reads and it depends heavily on what the initial policy priors are. (this is doubly so with -1 value prior which we know tends to make the MCTS go very deep on the first move).

brilee avatar Aug 23 '19 17:08 brilee

Yes, it applies to all nodes, but doing 8x reads is (generally) better than averaging all of them. With the inference cache, since it's close to free to rebuild the tree, it's easy to drop the tree, get a 'truer' top-level policy, and rebuild. This goes double for PCO, since we only drop the tree before a 'full readout'.

amj avatar Aug 23 '19 17:08 amj

We've added a symmetry-aware inference cache that averages all symmetries for a position, which (assuming an infinite cache) will tend towards returning the average of all 8 symmetries. For example, when running 9x9 selfplay with a 32GB cache, we end up with a cache hit-rate of >60%. This strongly indicates that nearly all of the games end up looking like each other. So if we did something like the suggestion, we'd also have to add more noise to the root, or possibly to every inference.

tommadams avatar Sep 21 '19 00:09 tommadams