David J Wu

Results 382 comments of David J Wu

Did you mis-type something, or can you clarify something? You write: > A tiny non-zero floor (or any similarly selective boost) is simply a mechanism to break that dead loop....

If you can find something better it would be very cool. I've tried vanilla self-attention once or twice before but the challenge is that it's expensive for a 19x19 board,...

Can you be specific/formal? What numbers do you propose recording in the data and training the neural net to predict? For example, the current policy head, brushing aside details around...

What the loss function / reward function for the A output, i.e. what incentivizes it to be good as opposed to just random-walking or converging to arbitrary distributions that have...

> I reason it could probably be trained as part of the self play? I don't think you can just handwave at this and say that it gets "trained" when...

Again, can you be more precise? What do you mean by "evaluation results" - do you mean utility, i.e. winrate and score? And can you be more precise about what...

@megabyte0 Keep in mind that AlphaZero-style training *already is based on iterative student-teacher training* where the policy network learned to predict a vastly stronger teacher (MCTS) that uses ~1000x more...

And how is the "blindspots" network itself supposed to be trained? Is it the same as the way networks are trained right now, except using a teacher MCTS that has...

Basically, you said: > Now we are talking not about the learning phase, but about the using the "current (trained) network/model" the optimal way in like katago gtp kata-analyze commands...

KataGo is trained through AlphaZero-like self-play, which involves training a policy and value network to predict the moves and game outcomes from MCTS using that policy and value network. The...