David J Wu comments

Results 382 comments of


                                            David J Wu

Still unresolved and worsening problems

Did you mis-type something, or can you clarify something? You write: > A tiny non-zero floor (or any similarly selective boost) is simply a mechanism to break that dead loop....

Integrating attention and feature vectors?

If you can find something better it would be very cool. I've tried vanilla self-attention once or twice before but the challenge is that it's expensive for a 19x19 board,...

Adding an output layer for MCTS exploration

Can you be specific/formal? What numbers do you propose recording in the data and training the neural net to predict? For example, the current policy head, brushing aside details around...

Adding an output layer for MCTS exploration

What the loss function / reward function for the A output, i.e. what incentivizes it to be good as opposed to just random-walking or converging to arbitrary distributions that have...

Adding an output layer for MCTS exploration

> I reason it could probably be trained as part of the self play? I don't think you can just handwave at this and say that it gets "trained" when...

Adding an output layer for MCTS exploration

Again, can you be more precise? What do you mean by "evaluation results" - do you mean utility, i.e. winrate and score? And can you be more precise about what...

Adding an output layer for MCTS exploration

@megabyte0 Keep in mind that AlphaZero-style training *already is based on iterative student-teacher training* where the policy network learned to predict a vastly stronger teacher (MCTS) that uses ~1000x more...

Adding an output layer for MCTS exploration

And how is the "blindspots" network itself supposed to be trained? Is it the same as the way networks are trained right now, except using a teacher MCTS that has...

Adding an output layer for MCTS exploration

Basically, you said: > Now we are talking not about the learning phase, but about the using the "current (trained) network/model" the optimal way in like katago gtp kata-analyze commands...

Adding an output layer for MCTS exploration

KataGo is trained through AlphaZero-like self-play, which involves training a policy and value network to predict the moves and game outcomes from MCTS using that policy and value network. The...