antlr4 [WIP] A working prototype for a suggestions helper

This is an (experimental at this point) implementation of a generic suggestions helper. Given a parsed input and a caret position, it returns a leaf context and expected tokens. The intent is to make this available in Java and Typescript to start with. Other implementations may follow if there is demand.

Mar 15 '24 19:03 ericvergnaud

Thanks for the making this happen!

I played around with the test cases a bit. Here is a test case that does not currently work as I'd expect:

# contents of java-ambiguous-prefix-parse.yml
# hypothetical complete class A)
# input: "public class Foo { public static    boolean myAttribute = 42; }"
# hypothetical complete class B)
# input: "public class Foo { public static    class Inner {} }"
# actual partial input (suppose the user has not typed out the full class yet):
input: "public class Foo { public static    "
caret:
  line: 1
  column: 34
expected:
  - boolean # expected with context chain: PrimitiveTypeContext --parent--> TypeContext --parent--> FieldDeclarationContext ...
  - class   # expected with context chain: ClassDeclarationContext --parent--> MemberDeclarationContext ...

# list incomplete, but at least both of the above should be suggested


# currently, I get the following instead of the expected tokens and contexts:
# tokens: @, private, static, protected, public, final, abstract, strictfp
# context chain: ClassOrInterfaceModifierContext --parent--> ModifierContext --parent--> ...

Edit: In a previous version of this comment, I had the expected cases mixed up.

Mar 19 '24 15:03 seb314

I made some changes, can you retry ?

Mar 20 '24 09:03 ericvergnaud

The suggested tokens for the test case look good now!

Regarding the single parser context for all suggested tokens, I'm a bit puzzled why that works. Is this really always the case, or is the java grammar maybe deliberately written to avoid ambiguities where different parser contexts could occur?

So far, with the java grammar, I did not come up with a test case that demonstrates a clear need for different contexts. I'll have to try either with a toy grammar or with our internal grammar and see what happens there.

Mar 21 '24 16:03 seb314

Regarding the single parser context for all suggested tokens, I'm a bit puzzled why that works. Is this really always the case, or is the java grammar maybe deliberately written to avoid ambiguities where different parser contexts could occur?

TBH the expected tokens relate to a state, not to a context i.e. given a state, the set of expected tokens is constant and not context dependent. That said, you might be right that as you walk up contexts, some tokens may no longer make sense. However I'm not sure how useful it could be to cater for that ? Can you provide an example where it matters ?

I'll be replicating the java code to JS/TS so you can play with it in your exact situation.

Mar 21 '24 18:03 ericvergnaud

@seb314 I've replicated the Java code to JS/TS. Example usage is in runtime/JavaScript/spec/suggestions/SuggestionHelperSpec.js

I haven't tried TypeScript. Feel free to yell if it's not working.

Mar 22 '24 16:03 ericvergnaud

I tried to come up with an example where more than one parser context is consistent with the same input & cursor position: https://github.com/seb314/antlr4/commit/118e126db71e4b44e5c85a0c9465858b6df7266f

Thanks for the JS version! I'll try that with our real grammar and see if I can find real-world examples.

EDIT: updated link to commit with fixed typo in commit message

Mar 22 '24 18:03 seb314

The TS build is broken, but has been for a couple of weeks... I'm looking into it, I don't think it relates to this change

Mar 22 '24 19:03 ericvergnaud

The TS build is broken, but has been for a couple of weeks... I'm looking into it, I don't think it relates to this change

As long as the js is runnable I'll probably be able to just suppress the tsc warnings for initial testing

Mar 22 '24 19:03 seb314

It's fixed now

Mar 22 '24 19:03 ericvergnaud

I mean now 🤣 (previous commit undid all the changes ...)

Mar 22 '24 20:03 ericvergnaud

I've now tried the js version with our regular test cases a bit.

The suggested tokens look mostly good already (modulo a few unwanted suggestions that look like they are plausibly due to the different handling of semantic predicates. Those could probably be filtered out individually, if necessary).

What I'm not sure about at the moment: Currently, getExpectedTokensAt provides the context that corresponds to the given input, without a suggested token appended. (Or inserted/replaced, if the given position is not at the end of the input, i guess). Is there maybe an efficient way to find out the context or rule that corresponds to the input with the suggested token, for each suggested token? One option would be to actually create all those hypothetical inputs, and call the parser on each of them, but that is maybe a lot more inefficient than necessary?

Mar 28 '24 10:03 seb314

I've now tried the js version with our regular test cases a bit.

The suggested tokens look mostly good already (modulo a few unwanted suggestions that look like they are plausibly due to the different handling of semantic predicates. Those could probably be filtered out individually, if necessary).

What I'm not sure about at the moment: Currently, getExpectedTokensAt provides the context that corresponds to the given input, without a suggested token appended. (Or inserted/replaced, if the given position is not at the end of the input, i guess). Is there maybe an efficient way to find out the context or rule that corresponds to the input with the suggested token, for each suggested token? One option would be to actually create all those hypothetical inputs, and call the parser on each of them, but that is maybe a lot more inefficient than necessary?

so you'd want to associate each suggested token with the resulting context if that token were appended ? I don't think that's possible at all because no such context will exist if the cursor is at the end of the input. And if it's before the end, a context will only exist for the token currently inserted at or after the cursor (and the ones after that). The contexts are instances of the actual parsing of the input, not of a potential one.

What might be possible is to provide the rule that the parser would enter. That's something the parser knows of without instantiating anything. But I'd suggest first trying your proposed approach i.e. parse hypothetical inputs, and see how that performs (notably it might be done asynchronously to reduce initial latency).

Apr 06 '24 14:04 ericvergnaud

@ericvergnaud thanks for the clarification! I'll try the approach with separate parsing and then report back (might take a couple of days until I have time to test this)

(Knowing the rule instead of the context should be sufficient to classify the suggestions. I'm not sure yet if it would also be sufficient for finding out the corresponding range of the input that gets replaced if the user accepts the corresponding suggestion, but maybe there is a way)

Apr 07 '24 17:04 seb314

antlr4 antlr4 copied to clipboard

[WIP] A working prototype for a suggestions helper

antlr4
antlr4 copied to clipboard