[for reference] all work done which is not in original repo
Since a lot has been done and several of these features are tough to 'extract cleanly' to produce 'simple' patches (they won't be simple anyway), the list of differences (features and fixes in the derived repo):
[to be completed]
Main features
-
full Unicode support (okay, astral codepoints are hairy and only partly supported) in lexer and parser
-
lexer can handle XRegExp
\pXXXunicode regex atoms, e.g.\p{Alphabetic}-
jison auto-expands and re-combines these when used inside regex set expressions in macros, e.g.
ALPHA [{UNICODE_LETTER}a-zA-Z_]will be reduced to the equivalent of
ALPHA [{UNICODE_LETTER}_]hence you don't need to worry your regexes will include duplicate characters in regex
[...]set expressions.
-
-
parser rule names can be Unicode identifiers (you're not limited to US ASCII there).
-
-
lexer macros can be used inside regex set expressions (in other macros and/or lexer rules); the lexer will barf a hairball (i.e. throw an informative error) when the macro cannot be expanded to represent a character set without causing counter-intuitive results), e.g. this is a legal series of lexer macros now:
ASCII_LETTER [a-zA-z] UNICODE_LETTER [\p{Alphabetic}{ASCII_LETTER}] ALPHA [{UNICODE_LETTER}_] DIGIT [\p{Number}] WHITESPACE [\s\r\n\p{Separator}] ALNUM [{ALPHA}{DIGIT}] NAME [{ALPHA}](?:[{ALNUM}-]*{ALNUM})? ID [{ALPHA}]{ALNUM}* -
the parser generator produces optimized parse kernels: any feature you do not use in your grammar (e.g.
errorrule driven error recovery or@elemlocation info tracking) is rigorously stripped from the generated parser kernel, producing the fastest possible parser engine. -
you can define a custom written lexer in the grammar definition file's
%lex ... /lexsection in case you find the standard lexer is too slow to your liking on otherwise insufficient. (This is done by specifying a no-rules lexer with the custom lexer placed in the lexer trailing action code block.) -
you can
%includeaction code chunks from external files, in case you find that the action code blurbs obscure the grammar's / lexer's definition. Use this when you have complicated/extensive action code for rules or a large amount of 'trailing code' ~ code following the%%end-of-ruleset marker. -
CLI:
-c 2-- you now have the choice between two different table compression algorithms:- mode 2 creates the smallest tables,
- mode 1 is the one available in 'vanilla jison' and
- mode 0 is 'no compression what-so-ever'
Minor 'Selling Points'
-
you can produce parsers which do not include a
try ... catchwrapper for that last bit of speed and/or when you want to handle errors in surrounding userland code. -
all errors are thrown using a parser and lexer-specific
Error-derived class which allows userland code to discern which type of error (and thus: available extra error information!) is being processed via a simple/fastinstanceofcheck for either of them. -
the jison CLI tool will print additional error information when a grammar parse error occurred (derived off / closely related to #321 and #258)
-
the jison CLI tool will print parse table statistics when requested (
-Icommandline switch) so you can quickly see how much table space your grammar is consuming. Handy when you are optimizing your grammar to reduce the number of states per parse for performance reasons. -
includes [a derivative or close relative of] #326, #316, #302, #290, #284
-
fixes
- #358 (crashes on
this.yy.parsermissing errors) - #356 (wrong input attached to error)
- #353 (crashes on
this.yy.lexermissing errors) - #352 (token_stack label issue: jison-gho's way of code stripping does depend on labels at all, so the issue is moot now)
- #349 (YYRECOVERING macro support -- should work, fingers crossed :wink:)
- #348 (
performActioninvocation trouble) - #333 (lexer recognizes literal regex parts without quotes whenever possible),
- #328 (all errors are
Error-derived instances with a text message and extra info attached), - #317 (?not sure?),
- #313,
- #301,
- #299 (with minor additional abilities compared to vanilla jison, e.g. configurable error recovery search depth),
- #296 (unused grammar rules are reported and nuked, i.e. not included in the generated output),
- #282,
- #276 (and we support JSON5 format besides!),
- #254,
- #239 (all parser stacks are available in all grammar rule action code via
yyvstack,yysstack, etc. -- documented in the documented grammar file's top API documenting comment chunk), - #233 (EBNF rewriting to BNF now works; see also the wiki),
- #231,
- #218 (and
parseErrorcan now produce a return value for the parser to return to the calling userland code), - #210,
- #175 (kind of..., we now support
%include filepathstatements in stead of any code chunk), - #165 (kind of... now jison does not fetch look-ahead when the rule reduce action doesn't need it; it requires intimate understanding of your grammar and the way this LALR grammar engine handles it, but you can once again code 'lexer hacks' from inside parser rules' action code. Shudder or rejoice, depending on your mental make-up ;-) ),
- #138 (
instanceofof parser and lexer error class), - #121 (indirectly, you can now do this by writing an action code chunk for an initial 'epsilon' rule and get this behaviour that way)
- #358 (crashes on
Where is this thing heading?
- using
recastet al to help analyze rule action code to help code-strip both parser and lexer to produce fast parse/lex runs. Currently only the parser gets analyzed (a tad roughly) to strip costly operations from the parser run-time to make it fast / efficient. - also note https://github.com/GerHobbelt/jison/issues/16: moving towards a babel-style monorepo. This work has now completed (oct-nov 2017: jison-gho releases 0.6.1-200+)
@GerHobbelt What about the failing tests? Any way to help out?