antlr4-autosuggest-js icon indicating copy to clipboard operation
antlr4-autosuggest-js copied to clipboard

Autocompletion not working when token has duplicate letters

Open AlanFoster opened this issue 6 years ago • 2 comments

Hey, I like the idea of this module - but I'm running into a bit of an issue with a simple auto-completion scenario.

I haven't taken a look at the source code yet but the npm module isn't generating any suggestions for an empty string and the following grammar:

grammar Expr;

prog:	SELECT ;

SELECT : S E L E C T;

WS: [ \t\n] -> channel(HIDDEN);

fragment S : [sS];
fragment E : [eE];
fragment L : [lL];
fragment C : [cC];
fragment T : [tT];

When there is no output, I would have expected an auto suggestion of select, but there is no suggestion given:

{
    "input": "",
    "errors": [],
    "suggestions": []
}

However, if I change the grammar to remove the repeated E letter within SELECT it works as expected:

prog:	SELECT ;

- SELECT : S E L E C T;
+ SELECT : S E L C T;

With the above change to the grammar, the autocomplete now suggests the initial token as expected, without the additional e of course:

{
    "input": "",
    "errors": [],
    "suggestions": [
        "selct"
    ]
}

I am using Antlr 4.7.1, and the code is mostly from the read me:

import { ExprLexer } from './gen/ExprLexer';
import { ExprParser } from './gen/ExprParser';
import ErrorAggregator from './error-aggregator';
import * as autosuggest from 'antlr4-autosuggest';

export function extract(input) {
    const errorAggregator = new ErrorAggregator();
    const autosuggester = autosuggest.autosuggester(ExprLexer, ExprParser, 'LOWER');

    let suggestions = autosuggester.autosuggest(input);

    console.log(JSON.stringify({
        input: input,
        errors: errorAggregator.getErrors(),
        suggestions: suggestions
    }, null, 4));
}

export default extract;

I can create a failing test / provide an example test project to help with debugging, just let me know how I can help :+1:

AlanFoster avatar Aug 12 '18 18:08 AlanFoster

@AlanFoster, thanks for reporting. Your description was very clear and it was easy to reproduce.

From a first look, this is the result of ANTLR4 producing a seemingly "flawed" lexer ATN. Looking at this ATN I get the impression that SELECT is just as valid as SELELECT and SELELELELECT and SELELELELELECT, which (me being now the auto-suggester) makes me not suggest anything.

However when testing this grammar with ANTLR4 directly, it seems that ANTLR4 is somehow able to distinguish between these inputs and only accept SELECT as valid. I will look more deeply into why this is happening, and ask around.

oranoran avatar Aug 12 '18 19:08 oranoran

@oranoran Thanks for the quick response! I poked around a bit and added the-atn flag to antlr and copied the results in graphviz to learn a bit more about the concept of ATNs, as it's all a new concept to me.

Interestingly I found that the following grammar doesn't have the same problem:

grammar Expr;

prog:	SELECT ;

SELECT : [sS] [eE] [lL] [eE] [cC] [tT] ;

WS: [ \t\n] -> channel(HIDDEN);

The above grammar generates the correct output now:

{
    "input": "",
    "errors": [],
    "suggestions": [
        "select"
    ]
}

AlanFoster avatar Aug 12 '18 22:08 AlanFoster