Syntactic completion
This PR implements completion based on language grammar (this complements the existing completion based on semantic information).
Here is a rough explanation of the algorithm, taken from the Syntactic_completion module:
Generate syntactic completion by analysing the parser stack. The task is split in a few steps:
First enumerate all reachable states by simulating all possible reductions. This is done by the Lookahead, Level and Stack modules. Lookahead keep tracks of a set of lookahead terminals: rather than simulating separately for each possible lookahead token, we regroup lookahead tokens that trigger the same reduction, such that a given reduction is simulated only once. Level keep track of all goto transitions to simulate on a given stack frame (hence the reductions are grouped per stack "level"). Stack modules simulate the stack reduction.
Auxiliary information are provided by:
- Parser_complete.state_to_reduction_table: the list of reductions to simulate when in a given state, already structured per level
- Parser_complete.state_goto_table: a naive but sufficient representation of the LR goto table. TODO: replace it by Menhir builtin goto table when possible (need to patch Menhir).
After that, we have the list of all states that were reached, and for each state, the set of lookahead tokens that led to it. The list is ordered: the state that is the deepest in the stack comes first. This is done to favor completions that will close the most syntactic constructions over completions that might open new nested constructions. In practice, it means that in this example : module M = struct let v = if true then x Completing after the x, the suggestions will be:
- first
end, to close the structure- then
in, to transforme the module let into an expression let- finally
else, to turn theif theninto anif then else. This order will be preserved by subsequent transformations.Then we turn each reached state into an "item set":
- Parser_complete.state_closure_table associates to each state the states that can be reached by following "null" reductions. (e.g. if we are in
let . rec?we can eachlet rec? .by assuming the rec flag is missing: rec? is a nullable reduction) TODO: maybe we can remove this step by simulating the closure from production definition, the runtime cost should be negligible.- Parser_complete.items_table associates a state to its itemset, in the form of a list of pair of (production, dot position)
The item sets are transformed into sequence of symbols by looking them up in Parser_complete.productions table, that contains the definition of each production. Extra step: we need to simulate the "closure" of the itemset. For instance if we have an item that looks like
. expression, we don't want to stop there and just suggest "expression", rather we want to expand the expression to its definition. This is done using Parser_complete.nonterminal_to_productions that lists all the productions that can produce a non-terminal.Now we have a list of symbols that constitutes valid continuations of the current parsing. We need to turn them into readable definitions that can be presented to the user. First, we keep only the ones that starts with tokens we consider "interesting" (mostly keywords) using "is_interesting_terminal". After that, for each starting terminal, we only keep the shortest sentence that can complete it. For instance, { if ... then ... , if ... then ... else ... , let ... = ... , let ... = ... in ... } is simplified too { if ... then ... , let ... = ... } Then we turn terminals into text using Parser_printer and replace non-terminals by "..."
[completion_for_parser] runs to whole pipeline.
TODO:
- [ ] The pipeline is not really efficient, some intermediate structures could be prune early to avoid a lot of redundant computations
- [ ] Add tests to the testsuite
- [ ] Tested in VIM. What about other editors?
Here is an example showing syntactic completion in an expression:

TODO: Syntactic completion engine does not handle well completing when the cursor is on an existing word. Fix handling when the cursor is
- [x] in the middle of a word
- [x] at the end of a word