KataGo Ask for guidance

Ask for guidance

Open Guyuena opened this issue 3 years ago • 3 comments

//Fill td-like value targets

"td-like " ?? How to explain it

Mar 27 '22 03:03 Guyuena

Not sure what your question exactly is. TD = Temporal Difference, a Reinforcement Learning method (see https://en.wikipedia.org/wiki/Temporal_difference_learning

For details on Katago td-like value targets, you may want to check issue #258.

Mar 28 '22 09:03 Ishinoshita

//C4-7: MCTS win-loss-noresult estimate td-like target, lambda = 1 - 1/(1 + boardArea * 0.176) //C8-11: MCTS win-loss-noresult estimate td-like target, lambda = 1 - 1/(1 + boardArea * 0.056) //C12-15: MCTS win-loss-noresult estimate td-like target, lambda = 1 - 1/(1 + boardArea * 0.016) //C16-19: MCTS win-loss-noresult estimate td-like target, lambda = 0 (so, actually just the immediate MCTS result). This is at the trainingwrite.h, I don't know what's mean.

Apr 01 '22 02:04 Guyuena

I assume that, based on a complete selfplay game, it writes the training data (description of position + training target for policy, value head, ownership head, etc; ...) in a file on your computer, that is next uploaded to the training server, for training network.

See here for a description of the training file structure and particularly for a description of the different channels C0, C1, ...

Apr 01 '22 06:04 Ishinoshita

KataGo KataGo copied to clipboard

Ask for guidance

KataGo
KataGo copied to clipboard