ELF icon indicating copy to clipboard operation
ELF copied to clipboard

The asynchronous approach achieves over 5x the selfplay throughput is not intuitive.

Open bjiyxo opened this issue 6 years ago • 8 comments

In the paper,

The asynchronous approach achieves over 5x the selfplay throughput of the synchronous approach on our hardware setup.

Could you please explain more details here? 5x is not intuitive. I'm not sure if it is related to Eval clients comparing the models and then discarding current games. If yes, then maybe you don't have to discard the games to prevent the speed lost (up to 5x).

bjiyxo avatar Feb 13 '19 16:02 bjiyxo

Thanks for your comments. The bottleneck is indeed in evaluation. In the sync version, we have to wait until all 400 games have been evaluated, and then decide whether we continue evaluating the next model or update the clients with the new model (which has >55% winrate than the current one). Note that this evaluation can be quite lagging because some clients might die or not responding.

yuandong-tian avatar Feb 14 '19 21:02 yuandong-tian

When there is a new model, all the previous selfplay games with the previous models need to be discarded.

yuandong-tian avatar Feb 14 '19 21:02 yuandong-tian

Then the conclusion of az>agz may not hold. If the pipeline of agz is to follow the following process: After each client ends the 400 evaluation games, it will go back to generate the self-play of current-best model. Once the server has a new model, the client will finish the current self-play, then reload the new model. Therefore, the whole process of the self-play throughput should be controlled < 1.5x, and the comparison of az and agz methods will be more convincing.

bjiyxo avatar Feb 15 '19 04:02 bjiyxo

Anyway, thank you for your patience.

bjiyxo avatar Feb 15 '19 04:02 bjiyxo

When there is a new model, all the previous selfplay games with the previous models need to be discarded.

I see from the paper that the ongoing games with the previous model were discarded:

If the new model is better than the current one by 55%, then the server notifies all the clients to discard current games, and restart the loop.

But the previous finished games were not necessarily discarded:

On the server side, the selfplay games from the previous model can either be removed from the replay buffer or be retained.

I think when the evaluation result is pending, you don't stop playing games. So does 5x throughput means that 80% of the games have to be discarded with the AGZ approach? Since #discarded games = #promoted models x (#simultaneous games)/2 (in terms of cost of time) and #simultaneous games = 32/GPU x 2000GPU = 64000, we have #discarded games/#total games = 32000#promoted models/#total games = 80%, so you need about 40000 games to promote a model, which is about right. I'm not sure not including discarded games in the throughput is the right practice, but that's be understandable. This could be clarified in the paper.

However, I can't find in the AGZ paper that unfinished games are discarded when a model is promoted, and it's hard to understand why you choose to do so. In fact, Leela Zero's autogtp clients only communicate with the server and check for a new model only when a game is just finished. Maybe in your setting the clients don't send requests to the server (anyway, the clients only need to place the finished game in the replay buffer). But you just need to pause the games, load the new model, and resume playing instead of discarding, as you do with the AZ approach. I think it'd be a fair comparison if you do this, though if the evaluation is lagging, you definitely improves slower than if it's not lagging. There are ways to speed up evaluation. LZ's server usually sends more than 400 evaluation request to clients exactly because some clients may die or be unresponsive. They also use SPRT to promote a net before 400 games are done. (But beware that short games come back with results first and long games are more likely to never come back, introducing a bias.)

It seems that most people think that the most important difference between AGZ and AZ is the difference in rollouts (1600 vs. 800) and using 55% gating vs. continually updating the model. There's a diagram in the supplementary material of DeepMind's Science paper showing AZ with symmetries trains much faster than AGZ, which I think is mostly due to these two changes. http://science.sciencemag.org/content/sci/suppl/2018/12/05/362.6419.1140.DC1/aar6404-Silver-SM.pdf

In summary, maybe it's appropriate to rename AGZ/AZ mode to something else.

alreadydone avatar Feb 15 '19 04:02 alreadydone

There's a diagram in the supplementary material of DeepMind's Science paper showing AZ with symmetries trains much faster than AGZ, which I think is mostly due to these two changes.

We are not sure Deepmind use the same amount of hardware in both AGZ and AZ. If the total amount of hardware are different, then the comparison loses its meaning.

bjiyxo avatar Feb 15 '19 04:02 bjiyxo

According to the report here, AlphaGo Zero uses 2000 TPUs for data generation. And in the AGZ paper, AGZ 20b gets 4.9M self-play games in 3 days, so it is about 816 games/TPU/day. In the AZ paper, AZ 20b gets 140M games in 13d using 5000 TPUs and this is about 2153 games/TPU/day. If we add the 1600/800 playouts difference into consideration, AZ is only 32% faster in game generation. So IMHO, we may need more tests before we reach the conclusion that AZ method is more efficient.

godmoves avatar Feb 15 '19 05:02 godmoves

  • The total amount of computation could serve as an objective measure. AZ was trained for 320.7h (~13d) with 140 million games at 800 playouts, from which we infer that AZ Symmetries was trained for 32.9h (by measuring pixels) with 14.4 million games at 800 playouts, which exceeds AGZ 20b's 4.9 million games at 1600 playouts by 47%, only to defeat it 61% of the times. Surprising and non-obvious from a cursory reading of the papers, but it seems AZ is in fact not as efficient! Maybe they continued for too long after the 20b model saturated.

image

Also note that AZ is close to (though generally ahead of) AGZ in terms of number of training steps to achieve the same strength. (The number of games AZ sees in one step is ~3x (147%x2) that of AGZ, so it's less prone to overfitting.)

  • ~~Upon re-examining the AGZ paper, they actually calls their pipeline asynchronous :)~~ (I see the distinction between "synchronized" and "synchronous") From the AGZ paper:

AlphaGo Zero’s self-play training pipeline consists of three main components, all executed asynchronously in parallel. Neural network parameters θ_i are continually optimized from recent self-play data; AlphaGo Zero players α_{θ_i} are continually evaluated; and the best performing player so far, α_{θ_i}, is used to generate new self-play data. Each mini-batch of data is sampled uniformly at random from all positions of the most recent 500,000 games of self-play. The optimization process produces a new checkpoint every 1,000 training steps. This checkpoint is evaluated by the evaluator and it may be used for generating the next batch of self-play games

@godmoves Thanks for the info. Evaluation games won't be able to explain away the 32% speedup; other factors like game length and resign threshold may come into play. 700 checkpoints were produced during AGZ 20b training which went 700,000 steps, so there were 400x700 = 280,000 evaluation games, only 5.7% of 4,900,000, the total number of games. However, evaluation games may be slower than self-play games because there's less batching, unless you let the two models run on different TPUs.

(Note to self: 1600p is strictly twice computation as 800p, unlike LZ's 1600v is not necessarily twice 800v. Likewise, AGZ's evaluation games don't cost less time than self-play games unless the resign thresholds are different.)

alreadydone avatar Feb 15 '19 06:02 alreadydone