First set computation - algorithm correct?
While studying your algorithm for computing the First set (source), I wondered whether it is correct.
Consider a grammar like this:
S -> A | x
A -> S | y
From my understanding of your algorithm, you initially set _firstSets[S] to the empty set. Thus, when doing the recursion on A, the First set of A is computed to be the union of { } (as _firstSets[S] is still empty) and { y }.
However, the First set of A is { x, y }.
What am I missing?
Hm.. yeah, good catch. Even though the grammar itself seems is not parsable, the first set for both non-terminals should be {x, y}. I'll take a look into it, though it's an edge case, and such a grammar seems mostly is of theoretical interested -- doesn't mean we don't have to fix it though.
For the test grammar:
%%
S
: A
| 'x'
;
A
: S
| 'y'
;
I get the:
syntax-cli -g ~/g.g -m lalr1 -t -s first
First set:
┌─────────┬───────────┐
│ Symbol │ First set │
├─────────┼───────────┤
│ $accept │ 'y', 'x' │
├─────────┼───────────┤
│ S │ 'y', 'x' │
├─────────┼───────────┤
│ A │ 'y' │
├─────────┼───────────┤
│ 'y' │ 'y' │
├─────────┼───────────┤
│ 'x' │ 'x' │
└─────────┴───────────┘
Parsing mode: LALR1_BY_SLR(1).
Grammar:
0. $accept -> S
----------------
1. S -> A
2. | 'x'
3. A -> S
4. | 'y'
LALR1_BY_SLR(1) parsing table:
┌───┬─────┬─────┬────────┬───┬───┐
│ │ 'x' │ 'y' │ $ │ S │ A │
├───┼─────┼─────┼────────┼───┼───┤
│ 0 │ s3 │ s4 │ │ 1 │ 2 │
├───┼─────┼─────┼────────┼───┼───┤
│ 1 │ │ │ acc/r3 │ │ │
├───┼─────┼─────┼────────┼───┼───┤
│ 2 │ │ │ r1 │ │ │
├───┼─────┼─────┼────────┼───┼───┤
│ 3 │ │ │ r2 │ │ │
├───┼─────┼─────┼────────┼───┼───┤
│ 4 │ │ │ r4 │ │ │
└───┴─────┴─────┴────────┴───┴───┘
Feel free to send a PR as well, in case you'll reach it faster than me.