antlr4
antlr4 copied to clipboard
JavaScript runtime: could the transition classes be exported?
The classes from the transition/*
subfolder of the javascript runime are not exported in in index.node.js and index.web.js.
Is this intentional, or could they be exported similar to e.g. the classes in atn/*
?
Background: we rely on some of the classes in our autocomplete implementation. With antlr4 v4.9.3, we were able to import them via their path, but now that "exports" are configured in antlr's package.json, the imports by path no longer work.
Happy to open a PR if this makes sense.
Mmm... interesting....
Why not simply use ATN.getExpectedTokens
?
What more do you get from transitions for the cost of replicating a core ANLTR function ?
My understanding of why we have a custom implementation is:
- for certain grammar rules, our desired autocomplete suggestions are not individual tokens, but a sequence of tokens that corresponds to the input consumed by the rule. To achive this, our depth-first search on the ATN has a callback hook for RuleTransitions. Example: a rule that describes a '/'-delimited file system path, where the suggestions are full paths of existing files, rather than individual directory names.
- semantic predicates are taken into account for the suggestions (getExpectedTokens doesn't seem to do this?)
- to find all (state, ctx) pairs that can be reached after consuming the input up to the cursor position, we do a depth first search on the ATN, where the transitions are the edges of the graph. I think this would be necessary in a getExpectedTokens-based approach too (in order to find the
stateNumber
andctx
parameter values), but maybe there is a higher-level method available that I'm not aware of.
Thanks for the explanation. Is this source code accessible ? Reason I want to dig into this is because in antlr5, transitions will definitely not be accessible, so we need to understand the usage scenario such that it can still be implemented (using other means)
Its not public, but I'm currently checking if I can share it.
@ericvergnaud I can probably send you the relevant code via email. (I'd prepare a self-contained set of files that can be built and run with only public dependencies) Would this be of interest to you?
It would definitely.
@ericvergnaud I've sent you an email to your address from git
@seb314 I've started implementing a built-in ANTLR solution such that developers don't nee to know about ANTLR internals for computing suggestions. A not-so-obvious finding is that the set of expected tokens at a given caret position does not vary with the enclosing context. So at this point, the API I'm experimenting with is simpler than discussed, as follows (in Java):
Pair<RuleContext, IntervalSet> getExpectedTokensAt(RuleContext startRuleContext, int line, int column) { .../... }
The returned context is the most specific context that contains the token (or last token) at the caret position. From there, it's straightforward to walk the hierarchy of contexts upwards (using the parent
field).
The above assumes all semantic predicates default to true. I would need to write some specific tests to support execution of semantic predicates, but that only makes sense if you stick to your current multi-dialect grammar ? Have you looked into grammar imports ?
See PR #4557
@seb314 Re 1) i.e. "our desired autocomplete suggestions are not individual tokens, but a sequence of tokens that corresponds to the input consumed by the rule", I have some questions re your example:
- since a path parser rule can be defined by something like:
('/' name)+
(which could very well be a series of divisions), am I correct in assuming that suggesting real paths is bound to the rule rather than a token sequence ? - if your lexer had a token for paths, would support for token sequence suggestions still be necessary ? More generally, the scenario where only one specific token can follow is theoretically a good candidate token sequence suggestions. However, when there is more than one, the number of sequences explodes more than exponentially. Not sure how manageable that is... And I'm a bit skeptical with the UX when suggesting more than 1 word (except for literals such as full paths). Can you provide more insights ?
@seb314 Re 1) i.e. "our desired autocomplete suggestions are not individual tokens, but a sequence of tokens that corresponds to the input consumed by the rule", I have some questions re your example:
* since a path parser rule can be defined by something like: `('/' name)+` (which could very well be a series of divisions), am I correct in assuming that suggesting real paths is bound to the rule rather than a token sequence ?
By "sequence of tokens" I meant: if we are in a path
rule, then we want to suggest a complete path rather than an individual name
element. So the actual use case is much simpler than arbitrary sequences.
* if your lexer had a token for paths, would support for token _sequence_ suggestions still be necessary ? More generally, the scenario where only one specific token can follow is theoretically a good candidate token _sequence_ suggestions. However, when there is more than one, the number of sequences explodes more than exponentially. Not sure how manageable that is... And I'm a bit skeptical with the UX when suggesting more than 1 word (except for literals such as full paths). Can you provide more insights ?
Resolved by the previous point. (Note: we are not using tokens for full paths for some quite specific reasons, but think those reasons are not relevant for the general case.)
Re your previous message: I'll first have to dig in a bit and respond in more detail afterwards. My first guess re semantic predicates is that assuming them as true will probably be good enough for us (either via grammar imports or with some token filtering in post-processing)
@ericvergnaud I commented a testcase in https://github.com/antlr/antlr4/pull/4557#issuecomment-2007543770