KataGo
KataGo copied to clipboard
Ask for guidance
//Fill td-like value targets
"td-like " ?? How to explain it
Not sure what your question exactly is. TD = Temporal Difference, a Reinforcement Learning method (see https://en.wikipedia.org/wiki/Temporal_difference_learning
For details on Katago td-like value targets, you may want to check issue #258.
//C4-7: MCTS win-loss-noresult estimate td-like target, lambda = 1 - 1/(1 + boardArea * 0.176) //C8-11: MCTS win-loss-noresult estimate td-like target, lambda = 1 - 1/(1 + boardArea * 0.056) //C12-15: MCTS win-loss-noresult estimate td-like target, lambda = 1 - 1/(1 + boardArea * 0.016) //C16-19: MCTS win-loss-noresult estimate td-like target, lambda = 0 (so, actually just the immediate MCTS result). This is at the trainingwrite.h, I don't know what's mean.
I assume that, based on a complete selfplay game, it writes the training data (description of position + training target for policy, value head, ownership head, etc; ...) in a file on your computer, that is next uploaded to the training server, for training network.
See here for a description of the training file structure and particularly for a description of the different channels C0, C1, ...