nearley
nearley copied to clipboard
An example showing significant whitespace?
I'm having difficulty creating a parser for a language like Pug, I haven't tried using an external lexer, but I have a sneaking suspicion it is necessary.
Can you provide an example which shows how to do it?
significant whitespace
As in multiple contiguous whitespace characters?
OMS -> [\s]:* # optional multi-line whitespace
RMS -> [\s]:+ # required multi-line whitespace
No, as in scopes delimited by indented sections of text (al a pug python haskell, etc.)
as in scopes delimited by indented sections of text (al a pug python haskell, etc.)
Ah, so indent / dedent... that will be a context-aware parsing solution.
Use local state
You could get away with creating and updating a local context in the grammar post-processing step, eg.
LINES
-> LINES RBS LINE {% d => {
// where d[0] is the state object
d => updateState(d)
} %}
| LINE {% d => createState(d) %}
RBS -> OWS LF OMS # required break space
OMS -> [\s]:* # optional multi-line space
OWS -> [ \t\r]:* # optional white space
LF -> "\n"
This technique, however, will pose a few challenges and limitations, but it's one way to go about this without creating your own lexer.
Use a custom lexer
This may perhaps be the more trivial way of parsing indent / dedent - as your sneaking suspicion was hinting to. (Haven't tried it myself yet.) I found the following on moo's issue tracker for context-aware indent / dedent parsing: https://github.com/no-context/moo/issues/55 with the last link (moo-indentation-lexer) being the one you probably want.
Then according to the nearley docs:
@{%
const moo = require("moo")
const IndentationLexer = require('moo-indentation-lexer')
// Create a lexer from rules
const mooLexer = moo.compile({ ... })
// Create an indentation-aware lexer using the lexer
const lexer = new IndentationLexer({ lexer: mooLexer })
%}
# Pass your lexer object using the @lexer option:
@lexer lexer
BLOCK -> HEADING %indent STATEMENTS %dedent