maia-chess icon indicating copy to clipboard operation
maia-chess copied to clipboard

Train for maia:2000 maia:2100 maia:2200

Open jorditg opened this issue 4 years ago • 52 comments

It would be interesting to have other more powerful versions having the same human-like style.

jorditg avatar Feb 01 '21 11:02 jorditg

Making higher rating Maia models is somewhat challenging as there are not enough games for us to use the same sampling methods we used for the published models. We have done some experiments with different strategies but the results have been unsatisfactory. Also as the ratings increases the players' moves approach those of Stockfish/Leela.

I'd also like to note that the goal of our project is not to create the best chess engine, we are trying to create models that can be used for learning/teaching.

reidmcy avatar Feb 03 '21 05:02 reidmcy

Thank you redmcy for your feed back. I understand the argument of not having enough data for training. In any case, if it was possible to train them for the proposed elos, you should have to take in to account that they will have also a great value. Think that leela and stockfish are 3500+ engines and that having a human like engine of a similar strength can be very valuable for training purposes. 2000- 2500 are the more common elos in OTB amateur chess, that's why it would be valuable to have human like engines of such elo ratings.

Thanks for your great work!

jorditg avatar Feb 05 '21 13:02 jorditg

Bumping @jorditg, as I myself would love to have a human-like sparring partner of this strenght. #2000-2500PlayerLivesMatter What is the data requirement @reidmcy? In terms of number games for this to make sense. For example, what if we just merged this whole segment?

FrugoFruit90 avatar Feb 08 '21 13:02 FrugoFruit90

We need about 12 million games, with good endgames so not fast time controls.

Mixed player ratings was one of the first things we tried after the first paper, it gave much weaker results.

reidmcy avatar Feb 08 '21 19:02 reidmcy

Do we have any performance values using maia9 for example with 16, 32, 64, 128, etc. tree search values instead of 1?

jorditg avatar Mar 17 '21 11:03 jorditg

Figure 5 of the paper shows 10 rollouts, but we did others too with similar results

reidmcy avatar Mar 17 '21 15:03 reidmcy

If number of games is a problem for >1900, why not use approx conversions and also import games from chess.com, FICS and ICC (the later two go back 25 years although ratings may have deflated / inflated during the course) into the training set. It might need a bit of tweaking to get it right, but I suspect the drop off of games between 1900 and 2000 is not that much that taking additional games from other servers into account should at least get you 2000 if not 2100.

However it does raise a bigger question which I think is an interesting one and why you might not get such good results above 1900, but worth investigating for both AI and chess. The higher the rating the more balancing intuition with calculation takes place. A master will say things like I feel this is the right move, or I'm not worried about that move, particularly at faster time controls which use of the clock is a big factor. When it does get a little more complicated they calculate, often 2 or 3 moves ahead (and as the time control increases conscious calculation becomes more a factor). By using depth 1 and a neural net, you're simulating which move would be played if calculating ahead were disallowed (although taking tactical patterns into account), and when you start going higher this won't predict the move made when calculation would have been made. This works well to simulate sub-2000 play, but not so much above that I believe. Thus to make a representative 2000 and above player (and there is a lot of interest here from improving players looking for a sparring/training opponent) you need to have an engine that balances choosing moves using intuition vs calculating when necessary. Using 'unnecessary' (or always to depth) calculation will cause evaluation of moves/positions the human wouldn't consider (even if it were finding objectively the best move). This might be impossible to train because humans don't really understand intuition well and the data where the player calculated or not isn't available except by guessing on how much clock time was used (not always an accurate indicator of tactical complexity), but it might be possible via some heuristic which sets the depth based on how tactical the position is and balances the tree / depth with the skill level being represented.

wickeduk avatar Sep 11 '21 09:09 wickeduk

I did some experimenting with lowering the quality standards and didn't get good results. Using games from outside Lichess is tricky as most other servers don't have free archives available. I also can't go violating the terms of use for the sites with a scraper even if the data are available since I'm doing this as a part of my PhD.

I think your second point is interesting and we have a student looking into something similar since May, we frame it as more of an inverse RL task.

I do alos think your concerns about depth of search are a bit off, the neural network could be doing some kind of search internally. In fact the model is designed to extra information sequentially so is almost certainly do a some kind of search. So depth 1 search doesn't mean the same thing to Maia/Leela as to stockfish or a human.

reidmcy avatar Sep 15 '21 08:09 reidmcy

It's possible to download games from chess.com (by player at least, I'm not sure about by rating) as well as FICS has a database ficsgames.org. FICS' database goes back more than 20 years - smaller numbers, but if you add it to the set it may get it above the number you need . Probably if you asked nicely at ICC (assuming there isn't already a download) they may be able to help you too and would have more games around the 2000 mark. The only problem would be standardising approximate rating across servers for which there are a number of surveys out there. David

Sent from Yahoo Mail on Android

On Wed, 15 Sep 2021 at 9:07, Reid @.***> wrote:

I did some experimenting with lowering the quality standards and didn't get good results. Using games from outside Lichess is tricky as most other servers don't have free archives available. I also can't go violating the terms of use for the sites with a scraper even if the data are available since I'm doing this as a part of my PhD.

I think your second point is interesting and we have a student looking into something similar since May, we frame it as more of an inverse RL task.

I do alos think your concerns about depth of search are a bit off, the neural network could be doing some kind of search internally. In fact the model is designed to extra information sequentially so is almost certainly do a some kind of search. So depth 1 search doesn't mean the same thing to Maia/Leela as to stockfish or a human.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

wickeduk avatar Sep 15 '21 08:09 wickeduk

The Chess.com per player downloads is what I was alluding to with the scraping comment

reidmcy avatar Sep 16 '21 08:09 reidmcy

Making higher rating Maia models is somewhat challenging as there are not enough games for us to use the same sampling methods we used for the published models. We have done some experiments with different strategies but the results have been unsatisfactory. Also as the ratings increases the players' moves approach those of Stockfish/Leela.

I found some resources you can scrape off of(I already wrote this here: https://github.com/CSSLab/maia-chess/issues/43#issuecomment-1448598614)

FICS Games: a free resource that offers a large selection of matches and allows you to sort by 2000-2199 ELO, 2200-2399 ELO, etc.- https://www.ficsgames.org/

Chessbase: a paid service that provides access to millions of matches played by higher ELO players- https://database.chessbase.com/

Lichess: another free option with over 8 million matches, including higher-elo games, sorted by month- https://database.lichess.org/

Chesstempo: offers over 2 million searchable games and allows you to sort by min and max ELO in advanced settings- https://old.chesstempo.com/game-database.html

EDIT: Some other resources I discovered-

https://www.kaggle.com/datasets/datasnaek/chess https://www.kaggle.com/datasets/zq1200/world-chess-championships-1866-to-2021 https://www.kaggle.com/datasets/ronakbadhe/chess-evaluations https://www.chess.com/games https://www.chessabc.com/en/chessgames https://gameknot.com/chess-games-database.pl https://sourceforge.net/projects/codekiddy-chess/ http://www.chessgameslinks.lars-balzer.info/ https://www.openingmaster.com/chess-databases https://www.365chess.com/chess-games.php

codeisnotfunjk avatar Feb 28 '23 18:02 codeisnotfunjk

It would be interesting to have other more powerful versions having the same human-like style.

Someone made a higher ELO version of maia on Lichess. Link is below.

https://lichess.org/@/humanian

codeisnotfunjk avatar Mar 01 '23 16:03 codeisnotfunjk

We can't easily combine games from different sources since Elo is not consistent between them. We also need about 12 million games after filtering so most of those sites are much too small. I also need to use sites that have licenses that allow use. I have been running experiments with Lichess, since there are many more games since the last paper, but I don't have anything to publicly release yet.

Also, keep in mind this is an academic project. The goal is not just to release better models, the goal is to do unique things and release new ideas. So releasing new models will need to be part of a larger project.

reidmcy avatar Mar 01 '23 18:03 reidmcy

It would be interesting to have other more powerful versions having the same human-like style.

Someone made a higher ELO version of maia on Lichess. Link is below.

https://lichess.org/@/humanian

That appears to be our Maia-1800 weights using MCTS search. As we showed in the original paper, using search reduces humanness of the engine. So that's closer to a weak Leela model than our Maia models.

reidmcy avatar Mar 01 '23 18:03 reidmcy

It would be interesting to have other more powerful versions having the same human-like style.

Someone made a higher ELO version of maia on Lichess. Link is below.

https://lichess.org/@/humanian

That appears to be our Maia-1800 weights using MCTS search. As we showed in the original paper, using search reduces humanness of the engine. So that's closer to a weak Leela model than our Maia models.

Hey :) I'm interested whether there is data as to how the estimated elo of a maia model changes when using mcts (even if it loses humanness) . Like what for example would be the elo of the 1900 Maia with search depth 3? Any experiments there?

hwuebben avatar May 19 '23 13:05 hwuebben

There was some working showing that a modified version of MCTS increases humanness, https://arxiv.org/abs/2112.07544.

Estimating Elo is difficult to do, as it's not a fixed number, it's based on the community's interactions with the player. Thus even comparing Elo between different chess servers is non-trivial. I have run experiments that showed that the KL-regularized MCTS increases winrate, but I can't directly calculate an Elo for the resulting model.

reidmcy avatar May 26 '23 05:05 reidmcy

I did some (quick and dirty) experiments to see how Maia behaves when using MCTS and found that each depth increases elo by approximately 400 points. More specifically I let the different Maias play each other on varying depth settings and found that Maia1100d2 (Maia1100 on depth 2) shows similar performance than Maia1500 and furthermore: Maia1100d2 ≈ Maia1500 Maia1100d3 ≈ Maia1900 Maia1500d2 ≈ Maia1900

now if you are brave you could assume that Maia1600d2 ≈ Maia2000 and so on... Take it with a grain of salt though ;)

In terms of humaness I assume that a Maia on depth 2 or 3 still plays more human-like than a Stockfish on a similar strength (if you can even achieve that). I could be wrong though.

If there is interest I could provide more information.

hwuebben avatar May 26 '23 09:05 hwuebben

It would be interesting to have other more powerful versions having the same human-like style.

Not to worry! I'm currently training a Maia model targeting an ELO rating of around 2500.

I previously trained Leela Chess Zero using supervised training data from the Lichess Elite Database, which has games ranging from 2100 to 2500 ELO rating. The net, EliteLeela, can be found on Lichess as a bot. However, I haven't run the bot in a very long time.

The great news is that the Lichess Elite Database has a total of over 19.7 million games, so it should be good to use for training.

I'll be letting my computer run for the next couple of days for it to train the model. Once it's done, I'll do the test match and see what rating it could be at using Maia 1900 as a baseline.

@reidmcy I'll send you the results and model when it's done, if you want.

CallOn84 avatar Jul 17 '23 08:07 CallOn84

@CallOn84 Im very interested in a stronger-than-1900 human like engine, and I want to encourage you to continue your work. Im simply a user of all these machine learning / AI tools but a programmer by day so perhaps I can help in some way.

Just one question, when you say "ELO rating of around 2500" do you mean lichess rating? Im not trying to be pedantic, its just lichess doesn't use ELO, the rating there is Glicko2 but many players do have FIDE titles and its not difficult to match their FIDE ELO to their Lichess rating

purefan avatar Aug 05 '23 09:08 purefan

@CallOn84 Im very interested in a stronger-than-1900 human like engine, and I want to encourage you to continue your work. Im simply a user of all these machine learning / AI tools but a programmer by day so perhaps I can help in some way.

Just one question, when you say "ELO rating of around 2500" do you mean lichess rating? Im not trying to be pedantic, its just lichess doesn't use ELO, the rating there is Glicko2 but many players do have FIDE titles and its not difficult to match their FIDE ELO to their Lichess rating

Yes, I meant 2500 Glicko2 rating, which is around 2000 ELO rating. The issue I'm facing right now is training data, as there isn't really much training data to make this work. On the Lichess Open Database, I was getting on average of 100,000 blitz, rapid and classical games that was around the 2500 Glicko2 rating area, which isn't enough as I need 12 million games+.

Now, this could be “fixed” by supplementing lichess games with OTB games and chess.com games, but there's a bit of an issue when it comes to finding them. Chess.com database games aren't open like lichess is, and OTB games requires extensive research of 2000 ELO-rated players to download their games.

So, that's the current issue right now. Otherwise, the training itself isn't that hard to do, except that it's on Linux and I hate running Linux with a passion.

CallOn84 avatar Aug 06 '23 13:08 CallOn84

@CallOn84

Surely there are over 100k quality games played on lichess through all the years available, how many games and what criteria do you need? I can download and filter from https://database.lichess.org/

Also, if you point me to a doc I'll be happy to run this on linux (I love running linux with a passion :joy: )

purefan avatar Sep 18 '23 11:09 purefan

@CallOn84

Surely there are over 100k quality games played on lichess through all the years available, how many games and what criteria do you need? I can download and filter from https://database.lichess.org/

Also, if you point me to a doc I'll be happy to run this on linux (I love running linux with a passion 😂 )

There are over 100k games that are around 2500 Glicko2 rating, the issue is that there isn't 12 million of them. Trust me, I spent hours going through each year and only getting aorund 100,000 games per year.

CallOn84 avatar Sep 18 '23 12:09 CallOn84

@CallOn84 Surely there are over 100k quality games played on lichess through all the years available, how many games and what criteria do you need? I can download and filter from https://database.lichess.org/ Also, if you point me to a doc I'll be happy to run this on linux (I love running linux with a passion 😂 )

There are over 100k games that are around 2500 Glicko2 rating, the issue is that there isn't 12 million of them. Trust me, I spent hours going through each year and only getting aorund 100,000 games per year.

well I just downloaded last month's archive and found 323k with this criteria:

  • Elo >= 2500
  • Time >= 180 seconds

I filtered them with pgn-extract The 1GB zipped file (down to 210MBs) can be found here

Now, I know Im guesstimating but it doesnt seem unreal to me to find 3.5m~4m games per year, and while the further back we go the less games are found the archive goes all the way back to 2013, if that criteria (elo and time control) satisfies the need I really think its feasible to find 12m games in the archive

purefan avatar Sep 19 '23 13:09 purefan

@CallOn84 Surely there are over 100k quality games played on lichess through all the years available, how many games and what criteria do you need? I can download and filter from https://database.lichess.org/ Also, if you point me to a doc I'll be happy to run this on linux (I love running linux with a passion 😂 )

There are over 100k games that are around 2500 Glicko2 rating, the issue is that there isn't 12 million of them. Trust me, I spent hours going through each year and only getting aorund 100,000 games per year.

well I just downloaded last month's archive and found 323k with this criteria:

  • Elo >= 2500
  • Time >= 180 seconds

I filtered them with pgn-extract The 1GB zipped file (down to 210MBs) can be found here

Now, I know Im guesstimating but it doesnt seem unreal to me to find 3.5m~4m games per year, and while the further back we go the less games are found the archive goes all the way back to 2013, if that criteria (elo and time control) satisfies the need I really think its feasible to find 12m games in the archive

Did you filter out the bullet and hyperbullet games as well?

CallOn84 avatar Sep 20 '23 17:09 CallOn84

@CallOn84

well I just downloaded last month's archive and found 323k with this criteria:

  • Elo >= 2500
  • Time >= 180 seconds

Did you filter out the bullet and hyperbullet games as well?

Yes, it only includes games that start with 3 minutes or more, so Im actually excluding games like 2+10 which is -according to lichess- blitz.

purefan avatar Sep 21 '23 06:09 purefan

You also need to remove games where either player is a bot and where there aren't enough moves, after dropping low clock moves. Also, are you limiting the difference in rating between players? We required both players to be of similar rating, i.e., both in the same bin.

I've found that doing wide ranges of Elo leads to worse results, i.e., a 2100 only model performs better than a model trained on 2100-2500 rating players even though the training data is a strict superset.

reidmcy avatar Sep 21 '23 18:09 reidmcy

You also need to remove games where either player is a bot and where there aren't enough moves, after dropping low clock moves. Also, are you limiting the difference in rating between players? We required both players to be of similar rating, i.e., both in the same bin.

I've found that doing wide ranges of Elo leads to worse results, i.e., a 2100 only model performs better than a model trained on 2100-2500 rating players even though the training data is a strict superset.

Did you use pgnextract to get your games? If so, how did you remove player bots, not enough moves, or dropping low in clock?

CallOn84 avatar Sep 21 '23 18:09 CallOn84

I wrote my own parser, there's an early version of it in this repo.

reidmcy avatar Sep 22 '23 07:09 reidmcy

On a side thread, what would happen if say you trained 1900 with a large set of games, but continued the training with say a smaller set of games of 2200 rating. Would this increase the rating (maybe not quite to 2200), but simulate someone improving from a base level? This could be another avenue of generating a stronger human-like engine. And what if this was done in steps of various plateau ratings say 1100, 1300, 1700, 1900, would this produce an even more human-like engine? David

Sent from Yahoo Mail on Android

On Fri, 22 Sept 2023 at 8:37, Reid @.***> wrote:

I wrote my own parser, there's an early version of it in this repo.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

wickeduk avatar Sep 22 '23 07:09 wickeduk

You also need to remove games where either player is a bot and where there aren't enough moves, after dropping low clock moves. Also, are you limiting the difference in rating between players? We required both players to be of similar rating, i.e., both in the same bin.

I've found that doing wide ranges of Elo leads to worse results, i.e., a 2100 only model performs better than a model trained on 2100-2500 rating players even though the training data is a strict superset.

Hello @reidmcy

I only saw the rating and time requirement but lets formalize the requirements, Im happy to further filter and try to get a useful data set, would you agree these are accurate:

  • Games must be at least 180 seconds at the start (blitz, rapid, classical)
  • Games must not be played by a bot
  • Both players must be rated at least 2500 in lichess
  • Games must be at least 30 halfmoves long

Anything else? anything to change?

purefan avatar Sep 22 '23 20:09 purefan