chess-library icon indicating copy to clipboard operation
chess-library copied to clipboard

Allow iterators for PGN parsing

Open vladkvit opened this issue 1 year ago • 4 comments

I'm using this library as a helper for feeding data to a machine learning model. To that end, I need to be able to iterate over games in a PGN (as opposed to parsing in bulk).

In my code, I do something like this:

class StepParser : pgn::StreamParser<> {
public:
    explicit StepParser(std::istream& stream) : StreamParser{ stream } {}

    bool initRead(pgn::Visitor& vis) {
        visitor = &vis;

        if (!stream_buffer.fill()) {
            return false;
        }
        return true;
    }

    void finishRead() {
        if (!pgn_end) {
            onEnd();
        }
    }

    bool readNextGame() {
        while (auto c = stream_buffer.some()) {
            if (in_header) {
                visitor->skipPgn(false);

                if (*c == '[') {
                    visitor->startPgn();
                    pgn_end = false;

                    processHeader();
                }

            }
            else if (in_body) {
                processBody();
            }

            if (!dont_advance_after_body) {
                stream_buffer.advance();
            }
            dont_advance_after_body = false;

            if (pgn_end) {
                pgn_end = false;
                return true;
            }
        }
        return false;
    }
};

On the library side, I had to switch several members of StreamParser from private to protected.

This is all quite a bit of a hack, so I'm wondering if there is potential for official support to be able to iterate over games.

vladkvit avatar Nov 23 '24 21:11 vladkvit

Sorry I don't quite understand what your problem is. A *.pgn file can consist of multiple individual pgn's, and this parser is able to parse all games in the *.pgn file. The startPgn will be called at the start of a new pgn inside the pgn file and the onEnd when it ends. A sample implementation is here https://github.com/official-stockfish/WDL_model/blob/master/scoreWDLstat.cpp, where we are running the parser over multiple pgn files, which themself consist of additional pgns.

Disservin avatar Nov 23 '24 23:11 Disservin

Ah, sorry if I wasn't clear. I want a function I can call (as opposed to a callback) that will parse a single game from a PGN. In pseudocode, it would look something like:

parser = SetUpParser( pgnpath )
game1 = parser.parseNextGame()
game2 = parser.parseNextGame()

vladkvit avatar Nov 23 '24 23:11 vladkvit

I see, I might add this but I’m not sure about returning a game object.

Disservin avatar Nov 24 '24 01:11 Disservin

When speaking about this, if we can have something like "pgn.parseNextGame", then why not a std-style iterator like. "++pgn"?

achimste avatar Apr 19 '25 06:04 achimste