Idea: new NN target - trying to predict NN per-location-ownership filtered through MCTS (plus few questions).

Open lukaszlew opened this issue 2 years ago • 3 comments

Context

If I understand g170/selfplay/README.txt correctly, globalTargetsNC C7, C11, C15 are point results evaluated based on the predictions of the NN itself averaged through MCTS path with an appropriate $\lambda$.

globalTargetsNC C3 can be understood as using ($\lambda$ = 1) as NN prediction and Go-rules should give the same results. Q: Is my understanding correct?

valueTargetsNCHW C0 is analogous to globalTargetsNC C3, but operates on per-board-location basis. In particular globalTargetsNC C3 is a sum of valueTargetsNCHW C0 over HW dimensions. Q: Is my understanding correct? Q: Are these used as regularization head in the network training?

Idea

Introduce analogues of globalTargetsNC C7, C11, C15 in valueTargetsNCHW. I.e. a NN target trying to predict NN per-location-ownership filtered through MCTS.

Motivation:

End-of-game per-location ownership with respect to a mid-game position has a lot of "noise" coming out of intermediate moves. MCTS has only the nearest sequence.
This seems that it is not too expensive to compute.
Per-location end-of-game ownership target is an excellent regularizer, so maybe worth the effort?

Mar 25 '23 20:03 lukaszlew