AlphaNLHoldem
AlphaNLHoldem copied to clipboard
How do you judge/track the convergency of the holdem model?
Even after ~ 1 billion self-play, over 1000 checkpoints, the model seems still not converge
Thanks for your wonderful project. May I ask how do you judge the convergency of a self-play game model? Are there any evaluation metrics recommended?
I don't think evaluating with rewards (or utilities) is a good choice, because it is continuously self-improving on both sides at the training stage.