Create ParserBuilder class and build-parsers binary
I cleaned up the grammar/parserBuilder.php file a bit and moved all the potnetially reusable code into a new class ParserBuilder which can be instantiated using the ParserBuilderFactory class.
Then I moved the grammar/parserBuilder.php into the bin folder and added it to the composer.json. That way we can just call bin/build-parsers to trigger a rebuild.
Plus I modified the regex for node resolving so that it now can handle fully qualified class names (with leading \).
What is the motivation behind exporting the parser generation as part of the public, stable API?
I want to use it for other grammar files than just the php5 and php7 yaccs. And that way I can reuse this instead of copying everything to a new project...
Maybe a separate repository for the yacc parser builder/generator would also be a nice solution. So I don't have to pull in all the other PHP parser code stuff (all the node classes, etc).
Or I could just remove the bin/build-parsers again, restore the file in the grammar dir which just uses the ParserBuilder class.
So, I'm mainly concerned about two things:
First, the parser generation code is quite tightly coupled to the needs of this project, even with the additional abstraction you've introduced. The parser macros are specific to parsing PHP, the unusual splitting of token declarations is an artifact of having two parsers with a single lexer, and the parser implementation itself contains large amounts of PHP-specific code. It's certainly possible to directly reuse this code for other grammars, but it would come with quite a number of unnecessary things.
Second and more importantly, this exports something that was previously internal helper code as part of the public API. This means I will no longer be able to make changes to the parser generation in a stable branch, or at least such changes will have to be done a lot more carefully (for example, can I change the definition of a macro in a stable branch?) Given that there will probably be very few people interested in using this and given that parser generation is not what this library does, I don't think that this is worthwhile.
As such, I would go with copy&pasting the code here... Or, if this is something useful for more than one project, creating a library just for the parser generation.
You're absolutely right if you say that there is specific code to the PHP parser but there are also a lot of general things handled in it which one would need to create a parser (for example initializing an array and put elements into it, etc). Plus the whole kmyacc thingy which would be the same for many other programming languages I guess. So I think an own repository would totally make sense which gives you a public API to things which make sense to generalize. I mean everyone can extend the basic ParserBuilder class from that new repo then and add features to fit the needs of the grammar for the language which should be parsed.
Just to teach myself YACC I wrote a YACC parser using your parser generator code successfully (sure this is a pretty primitive parser compared to the PHP parser but I didn't need to change a line at all in your rebuildParser.php script (except the FQ namespace fix).
@nikic @TiMESPLiNTER Any updates?
@m1guelpf There is some relevant news here: We recently ported kmyacc to PHP at https://github.com/ircmaxell/PHP-Yacc, and once it's more mature, this project will switch to using that and remove the binary kmyacc dependency. PHP-Yacc should also be providing the necessary template files etc in the future to support simple parsers without additional configuration. So basically this is going down the "separate library" route, just that the scope turned out a bit more ambitious than a thin wrapper around kmyacc.
We have since moved to using phpyacc, and https://github.com/nikic/PHP-Parser/pull/770 has split up the files to make reuse a bit simpler, which is about as far as I'm willing to go here.