leftry
leftry copied to clipboard
Leftry - A left-recursion enabled recursive-descent parser combinator library for Lua.
Leftry - A left recursion enabled recursive-descent parser combinator library.
This library is for creating and composing parsers.
For example:
local grammar = require("leftry")
local factor = grammar.factor
local span = grammar.span
# Declaring a Non-Terminal, "A"
local A = factor("A", function(A) return
span(A, "1"), # 1st alternative, A := A "1"
"1" # 2nd alternative, A := "1"
end)
# Declaring a Non-Terminal, "B"
local B = factor("B", function(B) return
span(B, "2"), # 1st alternative, B := B "2"
A # 2nd alternative, B := A
end)
# Using the composed parser.
# The first argument is the input string.
# The second argument is the string index to start from.
print(B("111122222", 1))
This creates a parser B that can parse the string "111122222".
The purpose of the anonymous function in declaration of the non-terminal enables self-reference and referencing other non-terminals that are not fully initialized yet.
Install
luarocks install --server=http://luarocks.org/dev leftry
Algorithm
First trace with a left-factored version of the grammar, then apply the left-recursive grammar. Like how a human would intuitively do it.
Other Top-down left recursion enabled parser combinator implementations
- http://hafiz.myweb.cs.uwindsor.ca/xsaiga/fullAg.html
- https://github.com/djspiewak/gll-combinators
Running unit tests
lua test.lua
Usage
- A parser has the following function signature.
rest, values = parser(invariant, position, [peek])
restis the next index to parse.rest-1is the last index of the parsed value. Ifrestisnil, it means the parse is invalid.valuesis the data created as a result of a successful parse.invariantis the Lua string that is being parsed.positionis the integer index to start the parse from. It must be 1 or greater.peek(optional). If true, validates only, andvalueswill benil. Validating without creating the data is several times faster.
-
As iterator:
The function signature of a parser allows it to be used as a Lua iterator in a for-loop.
local actual = {} for rest, values in span("1", "2", "3"), "123123123", 1 do table.insert(actual, rest) end -- actual == {4, 7, 10}This can be useful, for example in a programming language parser, to iterate through each parsed statement.
-
Composition:
Parsers can be nested. Left recursion is allowed.
local A = factor("A", function(A) return span(A, "1"), "1" end) local B = factor("B", function(B) return span(B, "2"), A end) local src = "11112222" assert(B(src, 1, true) == #src + 1) -
Data initialization:
You can customise how the data is generated from a parse.
local A = factor("A", function(A) return span(A, "1") % function(initial, value) return (initial or "") .. value end, "1" end) local src = "111" A(src, 1) -- returns #src + 1, "111"
Left recursion
Leftry can handle some examples of left recursion.
-
Simple left recursion:
local A = factor("A", function(A) return span(A, "1"), "1" end) -
Nested left recursion:
local A = factor("A", function(A) return span(A, "1"), "1" end) local B = factor("B", function(B) return span(B, "2"), A end)
Performance
Performance of the built-in Lua parser.
- Macbook Pro 2.6 Ghz i7 with 16 GB RAM:
- Lua:
- Validate the grammar of around 0.4 megabytes of Lua per second.
- Parse 0.1 megabytes of Lua into abstract syntax tree representation per second.
- LuaJIT:
- Validate the grammar of around 4 megabytes of Lua per second.
- Parse 0.68 megabytes of Lua into abstract syntax tree representation per second.
- For comparison:
- Lua interpreter can load a 15 megabyte Lua function in one second.
- LuaJIT can load a 25 megabyte Lua function in one second.
- Lua:
Elements
-
factor(name, generate, [initializer])Create a non-terminal element.
nameis the tostring value of the element.generateis the function that, when called with this element, returns the definition of this non-terminal. The values returned with this function will be wrapped in anany. You may optionally, explicitly return a singleanythat contains all the alternatives. Strings literals are automatically converted intotermelements.initializeris the function that will be called with values parsed from this element to let the user convert the parsed value into something useful. See "Data Constructors" section.
Usage:
local A = factor("A", function(A) return span(A, "1"), "1" end) -
rep(element, [reducer])Create a repeating element. It can be used only in an
anyor aspan.elementis the element that can appear 0 or more times (if this element is in aspan), or 1 or more times (if this element is in anany). Strings literals are automatically converted intotermelements.reduceris the function that will be called with values parsed from this element to let the user convert the parsed values into something useful. See "Data Constructors" section.
Usage:
span( "1", rep("2", function(a, b) return (a or "")..b end), "3") -
opt(element)Create an optional element. It can be used only in a
span.elementis the element that can appear 0 or 1 times. Strings literals are automatically converted intotermelements.
Usage:
span("1", opt("2"), "3") -
any(...)Create an any element. The any element contains a set of alternatives, that will be attempted from left to right.
The
anyelement is used internally to wrap the alternatives returned by thefactor"generate" function.You do not need to worry about using
any. -
span(...)Create an span element. The span element can be assigned a reducer with the
%operator. See the "Data Constructors" section....the children of the span element. Each child must be encountered in order to provide a valid parse, unless it is anoptorrepelement.
Usage:
local A = factor("A", function(A) return span(A, "1") % function(initial, value) return (initial or "") .. value end, "1" end)The span element can also be assigned a spacing rule using the
^operator:local function span(...) -- Apply spacing rule to all spans we use in the Lua grammar. return grammar.span(...) ^ {spacing=spacing, spaces=" \t\r\n"} endSee built-in Lua parser for an example on what spacing function looks like.
-
term(literal, [initializer])Create a literal element.
literalthe string literal that must be encountered to provide a valid parse.initializeris the function that will be called with the literal whenever there is a valid parse.
Usage:
term("hello")
Data Constructors
Initializer
-
factorfunction( value, -- The parsed value to transform. self, -- The element responsible for parsing being transformed. position, -- The index where `value` was found. rest, -- The next index after `value` ends. choice) -- The index of the alternative `value` was parsed from. -- Default implementation return value end -
termfunction( value, -- The parsed value to transform. (The literal.) self, -- The element responsible for parsing being transformed. position, -- The index where `value` was found. rest) -- The next index after `value` ends. -- Default implementation return value end
Reducer
Reducers are functions that will be folded over each value that will be parsed.
-
spanfunction( accumulated, -- The accumulated value. In the first iteration it is `nil`. value, -- The current value that is parsed. self, -- The element parsing the current value. position, -- The index where the current value begins. rest, -- The next index after `value` ends. i) -- `value` belongs to the `i`th element of this `span` element. -- Default implementation return rawset(initial or {}, i, value) end -
repfunction( accumulated, -- The accumulated value. In the first iteration it is `nil`. value, -- The current value that is parsed. self, -- The element parsing the current value. position, -- The index where the current value begins. rest, -- The next index after `value` ends. i) -- The `i`th time the child element has been encountered. -- Default implementation return rawset(initial or {}, i, value) end
Caveats
The current implementation does not enforce the following rules properly.
-
A
spanmust have more than one child. -
In a left recursion alternative, only the first element may be the left-recurring non-terminal. More than one consecutive left-recurring non-terminal is not supported, even if it currently works.
-- This is OK local A = factor("A", function(A) return span(A, "1", A, "2"), "1" end) -- This is not OK local A = factor("A", function(A) return span(A, A, "1", "2"), "1" end) -
An
anyelement must not have anyoptchildren. -
A
repelement that is a child of ananyelement requires 1 or more elements to match. -
A
repelement that is a child of anspanelement requires 0 or more elements to match. -
The first nonterminal element of a span that is part of a left recursion path, cannot be wrapped in
optorrep.
TODO
-
~~Implement Lua grammar in Leftry to prove it can handle the grammar of a programming language.~~
-
~~Implement whitespacing support.~~
-
~~Add appropriate data initializers for the builtin Lua parser, to override the default ones.~~
-
Test the following grammar:
"b" | s ~ s ^^ { _ + _ } | s ~ s ~ s ^^ { _ + _ + _ } https://github.com/djspiewak/gll-combinators