lark
lark copied to clipboard
Extending python grammar
So I have tried the import_lark branch out (https://github.com/lark-parser/lark/pull/1446). I must admit, I am a complete beginner with Lark and so I'm not sure if my error is due to that, or due to something else.
What I a trying to do is create a grammar and parser for Snakemake which is a DSL built on top of python. i.e., any python syntax is valid snakemake syntax, and then there is snakemake-specific syntax on top of that.
I want to keep the Snakemake grammar definition separate from the Python grammar, hence why I stumbled across this issue.
Here is a small example of what I was trying to do (using lark installed from the linked branch (import_star))
from lark import Lark
lark = Lark(
r"""
%import python.*
start: file_input
ruledef: "rule" NAME ":" inputs outputs
inputs: "input:" files
outputs: "output:" files
files: (FILE_NAME)+
FILE_NAME: /[a-zA-Z0-9_\.\/]+/
"""
)
snakefile = """x = 42
rule foo:
input: 'foo.txt'
"""
def parse_snakemake_file():
return lark.parse(snakefile)
When I try to import the parse_snakemake_file function and run it I get the following
p = snakemake_parser.parse_snakemake_file()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/michael/Projects/snakemake-parser/src/snakemake_parser/__init__.py", line 27, in parse_snakemake_file
return lark.parse(snakefile)
^^^^^^^^^^^^^^^^^^^^^
File "/home/michael/Projects/snakemake-parser/.venv/lib/python3.12/site-packages/lark/lark.py", line 655, in parse
return self.parser.parse(text, start=start, on_error=on_error)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/michael/Projects/snakemake-parser/.venv/lib/python3.12/site-packages/lark/parser_frontends.py", line 104, in parse
return self.parser.parse(stream, chosen_start, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/michael/Projects/snakemake-parser/.venv/lib/python3.12/site-packages/lark/parsers/earley.py", line 280, in parse
to_scan = self._parse(lexer, columns, to_scan, start_symbol)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/michael/Projects/snakemake-parser/.venv/lib/python3.12/site-packages/lark/parsers/xearley.py", line 152, in _parse
to_scan = scan(i, to_scan)
^^^^^^^^^^^^^^^^
File "/home/michael/Projects/snakemake-parser/.venv/lib/python3.12/site-packages/lark/parsers/xearley.py", line 125, in scan
raise UnexpectedCharacters(stream, i, text_line, text_column, {item.expect.name for item in to_scan},
lark.exceptions.UnexpectedCharacters: No terminal matches ' ' in the current parser context, at line 1 col 2
x = 42
^
Expected one of:
* __ANON_5
* __ANON_13
* LPAR
* __ANON_18
* EQUAL
* VBAR
* DOT
* MORETHAN
* SEMICOLON
* __ANON_6
* PERCENT
* __ANON_17
* __ANON_2
* __ANON_16
* COLON
* __ANON_21
* AMPERSAND
* CIRCUMFLEX
* COMMA
* IN
* __ANON_11
* __ANON_10
* SLASH
* __ANON_12
* LESSTHAN
* __ANON_22
* __ANON_8
* IF
* __ANON_7
* __ANON_20
* NOT
* __ANON_15
* AND
* _NEWLINE
* MINUS
* __ANON_23
* __ANON_3
* __ANON_14
* PLUS
* __ANON_1
* __ANON_9
* LSQB
* OR
* STAR
* IS
* AT
* __ANON_19
* __ANON_4
again, this could be my misunderstanding - I wasn't certain what to use for start as that doesn't seem to be defined in the python grammer?
Also, I am happy to move this to a separate issue so as not to clutter this issue.
Originally posted by @mbhall88 in https://github.com/lark-parser/lark/issues/1397#issuecomment-2296242040
Response from @erezsh (https://github.com/lark-parser/lark/issues/1397#issuecomment-2296285568)
@mbhall88 Feel free to open a new discussion/issue.
Anyway, when importing grammars, you still have call all the relevant %ignore statements from the root grammar. (in this case, all the ignores inside python.g)
start is just the default root name. You can change it by providing the "start" parameter to Lark.
startis just the default root name. You can change it by providing the "start" parameter to Lark.
Sorry, I don't quite understand what this means? If this is documented, feel free to just point me at that.
https://lark-parser.readthedocs.io/en/stable/classes.html#lark.Lark
See the "start" parameter. You can use start="file_input"