rust-pgn-reader
rust-pgn-reader copied to clipboard
Parallelization for Reading Large PGN Files
So I was wondering if it's possible to implement some sort of parallelization for reading huge PGN files? For example, split the file into smaller chunks, either on disk or in memory, and then run the reader on each chunk. If it's possible, I'm wondering why it hasn't been implemented in this crate yet? Thanks.
A (completely correct) PGN parser cannot easily be parallelized at this level. For example, a { anywhere might mean that later chunks are actually part of a comment, even if they look like games.
It's not entirely impossible:
- A parser could speculatively split the PGN on a boundary that looks like a game end and fix/reparse chunks in the rare case that the decision turned out to be wrong.
- A parser could do a fast pass (similar to the skipping mode in this library) to determine game boundaries, followed by slower but parallelizable pass.
I haven't done this because so far there were always opportunities for more coarse-grained parallelism (e.g., https://database.lichess.org/ coming in multiple independent files already).