database icon indicating copy to clipboard operation
database copied to clipboard

Access to full move history of puzzle games or raw data dump

Open taha-yassine opened this issue 1 month ago • 9 comments

Hi,

I’d like to retrieve the full move history of all Lichess puzzle games. The API rate limits make it difficult to collect this data at scale.

Is there a raw dump or database export that includes puzzle games with their move histories, similar to the standard game database? If not, is there another recommended way to access this efficiently?

Thanks!

taha-yassine avatar Oct 17 '25 19:10 taha-yassine

Hi

Well the games are available, technically speaking...

ATM there's no recommended way to get them. What do you need them for?

ornicar avatar Oct 18 '25 19:10 ornicar

Thanks for the clarification. I understand that the games are available through the full archives. The issue on my side is that identifying only the puzzle-source games from the global database would require a large amount of additional filtering and processing, which I am trying to avoid.

My goal is to study how language models learn to play chess, and one of my ablations depends on having the full move history for the puzzle positions. Having a more direct way to access those games would be extremely valuable for this research.

taha-yassine avatar Oct 19 '25 11:10 taha-yassine

Hi @taha-yassine. There is this older (~3M vs the current ~5.5M) puzzle dump which was compiled by @mcognetta and that contains game information: https://github.com/mcognetta/lichess-combined-puzzle-game-db

We could look into adding the game information to the hugging face version of the data (https://huggingface.co/datasets/Lichess/chess-puzzles)

cakiki avatar Oct 19 '25 12:10 cakiki

You can export games by ID and by batches of 300 with this endpoint https://lichess.org/api#tag/Games/operation/gamesExportIds

The only limit is that you only make one request at a time.

I reckon you should be able to download a lot of games with that.

ornicar avatar Oct 19 '25 12:10 ornicar

Hi @taha-yassine. There is this older (~3M vs the current ~5.5M) puzzle dump which was compiled by @mcognetta and that contains game information: https://github.com/mcognetta/lichess-combined-puzzle-game-db

Thanks for the suggestion. Unfortunately, the MEGA download link doesn't seem to be working anymore.

We could look into adding the game information to the hugging face version of the data (https://huggingface.co/datasets/Lichess/chess-puzzles)

I think this would indeed be very useful!

You can export games by ID and by batches of 300 with this endpoint https://lichess.org/api#tag/Games/operation/gamesExportIds

The only limit is that you only make one request at a time.

I reckon you should be able to download a lot of games with that.

I wasn’t aware of that endpoint, thanks! I tested it and each request takes ~10 seconds, so fetching all ~5M games would take around 50 hours. That seems manageable as a workaround.

taha-yassine avatar Oct 20 '25 10:10 taha-yassine

Hi @taha-yassine

I recovered my old laptop and was able to find the dataset that was in the (now defunct) MEGA link. It is about 4.4Gb on disk when compressed and is a bit out of date (Sept 2022). However, it could still be helpful for you in that the game information should all still be the same, so to get an up-to-date dataset, I think you will just need to:

  1. download the entire puzzle database
  2. join the puzzle database with the game-puzzle database I created 2.1) update all of the puzzle information in my dataset with the new values (this is just a straight overwrite)
  3. extract the missing games
  4. download all of those games
  5. associate them with the missing puzzles in the dataset.

This would save you a lot of time (my dataset contains 2.9M of the 5.4M puzzles in the current dataset, and downloading the new dataset and updating those values can be done very quickly).

If you would find it useful, I am happy to work to send it to you. You can see the dataset format in this README: https://github.com/mcognetta/lichess-combined-puzzle-game-db?tab=readme-ov-file#example.

mcognetta avatar Oct 22 '25 06:10 mcognetta

Hi @mcognetta, it'd be awesome if you could upload it somewhere for me to download.

taha-yassine avatar Oct 22 '25 07:10 taha-yassine

I've sent you a message on Twitter (I don't want to share the link here, though I am not sure if it would actually cause me any issues). I will leave it up for another ~week since it is using a lot of my storage.

If anyone else wants access, please feel free to ping me here.

mcognetta avatar Oct 22 '25 08:10 mcognetta

Thanks everyone for the help! @cakiki I’ll leave the issue open in case you want to track a potential future update to the HF dataset as you suggested, but feel free to close it if you prefer.

taha-yassine avatar Oct 22 '25 09:10 taha-yassine