alpha-zero-general
alpha-zero-general copied to clipboard
updateThreshold
I started researching the alpha-zero-general algorithm, but I found this parameter in the main.py module
'updateThreshold': 0.6, # During arena playoff, new neural net will be accepted if threshold or more of games are won.
And this is the coach.py module
def learn (self): "" " Performs numIters iterations with numEps episodes of self-play in each iteration. After every iteration, it retrains neural network with examples in trainExamples (which has a maximum length of maxlenofQueue). It then pits the new neural network against the old one and accepts it only if it wins> = updateThreshold fraction of games.
Are there any discrepancies with the original description of "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm"?
In AlphaGo Zero, self-play games were generated by the best player from all previous iterations. After each iteration of training, the performance of the new player was measured against the best player; if it won by a margin of 55% then it replaced the best player and self-play games were subsequently generated by this new player. In contrast, AlphaZero simply maintains a single neural network that is updated continually, rather than waiting for an iteration to complete.
Did I understand correctly that this is the AlphaGo Zero algorithm, but not AlphaZero?
Yep you are correct! Check #137 #74. This repo was actually based on AlphaGo Zero.
Yep you are correct! Check #137 #74. This repo was actually based on AlphaGo Zero.
Thanks! What do you say about this difference between AGZ and AZ?
AlphaGo Zero tuned the hyper-parameter of its search by Bayesian optimisation. In AlphaZero we reuse the same hyper-parameters for all games without game-specific tuning. The sole exception is the noise that is added to the prior policy to ensure exploration (29); this is scaled in proportion to the typical number of legal moves for that game type.
Another question. What do you say about the file pseudocode.py from Supplementary Materials if this topic is still relevant to you?