libfsm
libfsm copied to clipboard
lx lookahead isn't always necessary
lx -l c
doesn't generate exactly the code I'd prefer for accepting states which have no further out-transitions.
For example, my sample language specification in #111 generates four states. In state S2
, we've already seen ..
so if we see a third .
then we've matched an $ellipsis
token. Currently, in that case lx
generates a transition to a state S3
and continues with a new iteration of the loop, which calls lx_getc
and then unconditionally passes the result to lx_ungetc
before returning TOK_ELLIPSIS
.
But instead of transitioning to a new state and reading then unreading a new character, when S2
matches a third .
, it could immediately return TOK_ELLIPSIS
.
This applies to any accepting state that has no out-transitions. From such a state, reading any additional character will always trigger an error transition. At that point the state machine must roll back to the most recent accepting state and return that, and we know statically which state that was.
This is a minor optimization, but I mostly care because the generated code would be slightly easier to understand if these unnecessary extra states were removed.