libfsm icon indicating copy to clipboard operation
libfsm copied to clipboard

lx lookahead isn't always necessary

Open jameysharp opened this issue 5 years ago • 0 comments

lx -l c doesn't generate exactly the code I'd prefer for accepting states which have no further out-transitions.

For example, my sample language specification in #111 generates four states. In state S2, we've already seen .. so if we see a third . then we've matched an $ellipsis token. Currently, in that case lx generates a transition to a state S3 and continues with a new iteration of the loop, which calls lx_getc and then unconditionally passes the result to lx_ungetc before returning TOK_ELLIPSIS.

But instead of transitioning to a new state and reading then unreading a new character, when S2 matches a third ., it could immediately return TOK_ELLIPSIS.

This applies to any accepting state that has no out-transitions. From such a state, reading any additional character will always trigger an error transition. At that point the state machine must roll back to the most recent accepting state and return that, and we know statically which state that was.

This is a minor optimization, but I mostly care because the generated code would be slightly easier to understand if these unnecessary extra states were removed.

jameysharp avatar Feb 20 '19 05:02 jameysharp