Idea: new NN target - trying to predict NN per-location-ownership filtered through MCTS (plus few questions).
Context
If I understand g170/selfplay/README.txt correctly, globalTargetsNC C7, C11, C15 are point results evaluated based on the predictions of the NN itself averaged through MCTS path with an appropriate $\lambda$.
globalTargetsNC C3 can be understood as using ($\lambda$ = 1) as NN prediction and Go-rules should give the same results.
Q: Is my understanding correct?
valueTargetsNCHW C0 is analogous to globalTargetsNC C3, but operates on per-board-location basis.
In particular globalTargetsNC C3 is a sum of valueTargetsNCHW C0 over HW dimensions.
Q: Is my understanding correct?
Q: Are these used as regularization head in the network training?
Idea
Introduce analogues of globalTargetsNC C7, C11, C15 in valueTargetsNCHW. I.e. a NN target trying to predict NN per-location-ownership filtered through MCTS.
Motivation:
- End-of-game per-location ownership with respect to a mid-game position has a lot of "noise" coming out of intermediate moves. MCTS has only the nearest sequence.
- This seems that it is not too expensive to compute.
- Per-location end-of-game ownership target is an excellent regularizer, so maybe worth the effort?