Bjarke Ebert
Bjarke Ebert
A related issue is that all games from a period are correlated / biased, since they are produced by the same ID and same search algo (although randomized). So even...
Sure, the loss on older validation games is expected to be bigger than on more recent validation games. But not by *a lot*. So the monitoring could be used to...
I find it strange that each game is used in multiple different training sessions. But given that we do, here's a modification of the idea of this issue: In each...
I am well aware of the recursive aspect of letting the network learn what a 800-node search would find :) > Because those 800 nodes will be searching for that...
Right, there's a noisy aspect of the final outcome. But that should help us: It will promote some moves out of lost positions, proportionally to how close those moves are...
I think the training is not the bottleneck here, rather the game generation is, right? So why not just fork a network, and train it in parallel. Maybe I'll just...
Another way to express my proposal is using learning weights for the policy loss SGD: Three different weights depending on game outcome loss, draw, win. Currently those are 1, 1,...
#551 seems very related Maybe my #604 is a dupe