consider making Grammar a hashmap
Having Grammar be a simple list of productions is elegant enough, but it doesn't really model the structure of a BNF grammar properly since there's nothing in it's structure explicitly connecting the nonterminals on the right to the nonterminals on the left, making Grammar a hashmap<Term, Vec<Expression>> would allow users to traverse grammars out of the box and be a good middle-ground between the current flat grammars and a fully connected approach.
i'm still uncertain about this idea and it's relatively low priority for now, in the meantime any feedback would be appreciated.
I think this is a fair point about Grammar. It will have to be a bit more complex than that, because there can be multiple Term -> [Expression]. But the need for this arises in a few places, including the EarleyParser. To still provide convenient iteration, BTreeMap will probably be desirable, and should be a non breaking change for users.
can't multiple Term -> [Expression]'s just be combined into one? it seems semantically equivalent
I think yes it is semantically equivalent, but then in order to track which was used for parsing input, it is easier to use:
ProdId = 0, <start> ::= <a> <b>
ProdId = 1, <start> ::= <c> <d>
// record during parsing that ProdId(1) was used to match input text
compared to:
ProdId = 0, <start> ::= <a> <b> | <c> <d>
// record during parsing that ProdId(0) and Index(1) was used to match input text
maybe silly reason but it certainly helps me when reasoning about the parser.
also another maybe silly reason, but someday I hope to have an API that supports user provided constructors for parse nodes, which having only one Expression per Prod helps:
let my_parser = Grammer::new([
("<start> ::= 'a' 'b'", UserDefinedStart::from_a_and_b),
("<start> ::= 'c' 'd'", UserDefinedStart::from_c_and_d),
]);
// then use that parser
let parsed : UserDefinedStart = my_parser.parse("cd").next().unwrap();
couldn't we just do what we're doing now, using the normalized version for parsing but not exposing it to the user? if they give us s = a / b and s = c we combine it into s = a / b / c but normalize it to s = a, s = b and s = c before parsing
though know that i say that i realize it makes decomposing productions the same way u might compose them impossible
not sure if that's a compelling enough use case though, u can still decompose productions easily enough so long as they're normalized in the first place