AlphaNLHoldem icon indicating copy to clipboard operation
AlphaNLHoldem copied to clipboard

How do you judge/track the convergency of the holdem model?

Open Josh00-Lu opened this issue 11 months ago • 1 comments

Even after ~ 1 billion self-play, over 1000 checkpoints, the model seems still not converge

Thanks for your wonderful project. May I ask how do you judge the convergency of a self-play game model? Are there any evaluation metrics recommended?

I don't think evaluating with rewards (or utilities) is a good choice, because it is continuously self-improving on both sides at the training stage.

Josh00-Lu avatar Feb 24 '24 03:02 Josh00-Lu