antlr4ts icon indicating copy to clipboard operation
antlr4ts copied to clipboard

The generated Javascript code uses exceptions where it should just return undefined

Open mike-lischke opened this issue 7 years ago • 26 comments

The generated Javascript behaves differently in various places than other targets (including Java and Javascript). One example is ParserRuleContext.getToken. The generated code throws exceptions while it should simply return null. There are other places like getChild (and perhaps even other classes) where this happens. It is very important that the behavior of the generated JS code is the same like in any other target.

mike-lischke avatar Dec 30 '16 14:12 mike-lischke

Due to strict null checking in TypeScript, it's not practical for this target to exactly match the other targets. Doing so causes an overwhelming burden on developers working with things like listeners and visitors for cases where the parse tree is known to not contain syntax errors.

The solution for this target is the introduction of try* methods to match the other targets, such as the following pair in TokenStream:

LT(k: number): Token;
tryLT(k: number): Token | undefined;

The equivalent for the getToken method you mentioned would be tryGetToken, but that currently doesn't exist. As a workaround you can use getTokens and check the length of the result before getting the token at the expected index. The same applies to getChild, where the alternative is checking getChildCount or calling getRuleContexts depending on which overload you were using.

:bulb: If you encounter a case where you expected a try* method but it doesn't currently exist, file an issue (preferably one issue for each needed try* method) so we can add it.

sharwell avatar Dec 30 '16 15:12 sharwell

Hmm, I don't get the idea of the tryXX methods. If getToken would just return a null instead of throwing everything would be fine (in fact I modified the local antlr4ts code like that and it works nicely). Other targets also do strict type checking and still don't need a tryXX function. That sounds like big burden for me, actually.

And btw, it's a listener where I have this problem (constructing a symbol table). I check for certain tokens and act depending on whether they have a value or not:

export class DetailsListener implements ANTLRv4ParserListener {
    public tokenVocab: string;

    constructor(private symbolTable: SymbolTable, private imports: string[]) {}

    exitLexerRuleSpec(ctx: LexerRuleSpecContext) {
        if (ctx.TOKEN_REF() != null) {
            let symbol: string = ctx.TOKEN_REF().getText();
            if (ctx.FRAGMENT() != null) {
                this.symbolTable.addSymbol(SymbolKind.FragmentLexerToken, symbol, ctx);
            } else {
                this.symbolTable.addSymbol(SymbolKind.LexerToken, symbol, ctx);
            }
        }
    }
   ...

The line ctx.TOKEN_REF() != null throws the exception, which is wrong IMO.

mike-lischke avatar Dec 30 '16 15:12 mike-lischke

@mike-lischke getToken returns Token. If you want it to return undefined, the method signature needs to change to returning Token | undefined. This means you can never use the result of getToken without first checking if the result is actually a Token, even in cases where you know you supplied a valid index. Rather than require null checks everywhere, we chose to instead change the semantics so the result is non-null, with alternate methods provided where they make sense.

Other targets also do strict type checking and still don't need a tryXX function.

TypeScript distinguishes undefined and null from other types in the type system. No other supported target language has a feature even resembling this. In Java for example, the Token type is equivalent to "Token or null", and the compiler will allow you to omit null checks without a problem. If you omit the null check from equivalent TypeScript code, it's a compile-time error.

sharwell avatar Dec 30 '16 15:12 sharwell

Yes, I got the point about the null check, but this is how we do it in every other target. Why does the TS target go a different route (without real need, I'd say)? It's not only about syntax errors! It can easily happen that you cannot find a specific token if it is optional (and not given in the input). Those tryXX functions just make things more complicated, really. Why is it a problem to check for null (or undefined)? You would have to do this anyway with a tryGetToken function, just that you have yet another function for the same task. If you are 100% sure you always get a valid token then just omit the null check. Where's the problem?

mike-lischke avatar Dec 30 '16 15:12 mike-lischke

I think the direction I want to see it go is to expand on the current analysis for the number of times a particular element can appear in an alternative. Currently we distinguish between items that can appear at most once, and those that can appear more than once. What we need is to further distinguish between those that are exactly once and those that are optional.

Appearances Accessor current return type Desired return type
T? T T | undefined
T T T
T* T[] T[]
T+ T[] T[]

sharwell avatar Jan 02 '17 15:01 sharwell

Hmm, yes, I agree. As we see you would use the same return type for T* and T+ which are similar like T? and T, respectively, which makes me think (also for keeping the class API small) that a common T | undefined is the way to go.

mike-lischke avatar Jan 02 '17 15:01 mike-lischke

@mike-lischke I filed #265 for the primary issue of optional elements having accessors that return required values. If you have other specific requests please feel free to create additional issues for them.

sharwell avatar Jan 04 '17 04:01 sharwell

I have to reopen this issue again, because the provided solution doesn't work. In case of syntax errors I have no way of checking if a token exists, because even testing for it throws an exception. Is this really what you consider the simpler solution? Here's an example:

    visitParserRuleSpec = function (ctx: ParserRuleSpecContext): string {
        if (!ctx.ruleBlock()) {
            return "# Syntax Error #";
        }

        let diagram = "ComplexDiagram(" + this.visitRuleAltList(ctx.ruleBlock().ruleAltList()) + ").addTo()";
        this.scripts.set(ctx.RULE_REF().text, diagram);

        return diagram;
    }

The call to !ctx.ruleBlock() throws already. How am I supposed to check if that token exists? Wrap everything with a try/catch?

mike-lischke avatar Mar 17 '17 18:03 mike-lischke

I dropped the ball on this. Issue #128 was intended to address this, but the tryXxx() approach never seemed completely satisfactory, and I could not find a typescript best practice recommended.

Let's go back to the having getToken() etc. return a nullable, as Mike suggests, and provide a generic unNull( x : T | null ) : T function which can throw on null to simplify consuming code where full flexibility isn't needed. It is a breaking change, but strict null checking should quickly identify any code that needs to be changed.

BurtHarris avatar May 18 '17 08:05 BurtHarris

@BurtHarris This issue only arises when processing trees that contain syntax errors. In cases where users are working with these trees, the tryXxx() approach seems reasonable. Switching back places the burden instead on users working with trees that don't have syntax errors, which is a bad experience overall.

sharwell avatar May 18 '17 13:05 sharwell

@sharwell OK yes, looking back at all the places getChild() is used, I remember the strength of that motivation. I see ParserRuleContext already has a tryGetChild() and tryGetRuleContext() as expected from the list in #128, but RuleNode.tryGetChild() is still missing.

What I don't understand is Mike's example using ctx.ruleBlock()..., I can't find ruleBlock anywhere in antlr4ts. Is he talking about some codegen produced method?

BurtHarris avatar May 18 '17 16:05 BurtHarris

@mike-lischke Oh, I see, your ruleBlock example comes from the grammar in antlr4-graps, is that right?

I still don't fully understand that, but please have a look at the PR above and let me know if its an approach you can live with, or if there are more methods like getChild that need a tryGetChild in the runtime.

P.S. An alternative I considered to tryXxx() some time back was to add an optional throws: Boolean parameter in an overload to methods like getChild, something like this:

getChild(i: number): ParseTree;
getChild(i: number, throws: Boolean): ParseTree | undefined;

I'd be interested in your feedback, do you like that any better?

ParserRuleContext.getChild() it already has an overload that takes one optional parameter ctxType, which actually controls the return type of the method. How to order the two optional parameters wasn't exactly clear to me. While I wrote this code, I really don't have enough experience actually using ANTLR to fully predict usefulness in scenarios involving syntax error recovery, etc.

BurtHarris avatar May 18 '17 18:05 BurtHarris

tryGetRuleContext() is not a nice replacement for normal rule context retrieval, because it requires the user to lookup the index of the context manually (which is otherwise done on code generation). I understand that because of the strict null check, when allowing undefined on rule getters, an application would either have to force not null (via the ! operator) or check if the returned value is defined, which is annoying when you know it must be correct.

But on the other hand it's annoying to use tryXXX methods in all other cases. In fact, to be on the safe side you would then always use tryXXX instead of the normal getters, unless you can 100% be sure your input is always correct (which I believe is an illusion). But when you use tryXX anyway, why having the normal context retrieval functions at all?

IMO the lesser evil is that (in a strict null check environment) you have to always check for undefined results, even if you believe you always get only correct input. Every other ANTLR target has to do the same checks as well. Why is it unacceptable for a TS target user to do the null checks too? What I try to avoid is also that antlr4ts behaves differently than other targets, without a real benefit. Of course YMMV, but I like to be able to easily port examples written for a different target easily to the typescript target (and in fact I frequently write C++ implementations first and then port them to typescript).

mike-lischke avatar May 19 '17 08:05 mike-lischke

@mike-lischke If your parse operation had syntax errors, don't run the visitor at all. For most user scenarios, most visitors are not only fine to run like that, but are actually cleaner and produce a more consistent user experience. When implementing code completion, it's more work to handle errors, but it's not a problem many users will ever need to be involved with.

unless you can 100% be sure your input is always correct

Outside of bugs in ANTLR, we are. All you need is this:

if (parser.numberOfSyntaxErrors !== 0) {
  // Do not run visitor
  return;
}

tryGetRuleContext() is not a nice replacement for normal rule context retrieval, because it requires the user to lookup the index of the context manually

I'm not seeing any place where you need to know an index that you wouldn't otherwise need to know the index when working with the generated code. Do you have an example?

sharwell avatar May 19 '17 13:05 sharwell

If your parse operation had syntax errors, don't run the visitor at all.

That's not an option. In order to have a passably useful symbol table you have to visit the produced parse tree in any case. Remember the syntax error could also be after the caret position, so everything before that is valid and needs to be considered (e.g. for code completion).

I'm not seeing any place where you need to know an index that you wouldn't otherwise need to know the index when working with the generated code. Do you have an example?

Sure, look at this generated code for an ANTLR4 parser. It calls this.getRuleContext(0, RuleBlockContext);. The index 0 is defined by the code generator. Now look at this visitor code. Here I have to use ctx.tryGetRuleContext(0, RuleBlockContext) (there is no function ctx.tryRuleBlock()). In order to use tryGetRuleContext I have to find out which index to use for the call (not to mention that if (!ctx.tryGetRuleContext(0, RuleBlockContext)) { is less attractive than just if (!ctx.ruleBlock()) {.

mike-lischke avatar May 20 '17 11:05 mike-lischke

OK @sharwell, thinking over @mike-lischke's generated-code example has me convinced that sometimes it's better to lie to the type system.

The TypeScript declaration of Array<T> is prototypical example for JavaScript: [n] is not declared to return T | undefined, even though in reality an out-of-bounds index will return an undefined (no need to throw an exception unless you try to use the value returned.)

By extension I suggest the generated code wrappers for syntactically-required parts of the parse tree should lie, we don't want tryXxx wrapper methods for them. The logical way to do so is declare them : T, even though : T | undefined is more accurate. They don't need to throw because unwary code will throw on it's own. Error aware code should verify truthy-ness of even apparently required syntax elements.

Carried to it's logical conclusion, I think this eliminates the justification for all tryGetXxx(n) methods, with the Array<T>.find() method being a logical replacement for the two argument getXxx(n, ctx) and tryGetXxx(n, ctx) overloads, which is something I would celebrate.

BurtHarris avatar May 21 '17 03:05 BurtHarris

That's not an option. In order to have a passably useful symbol table you have to visit the produced parse tree in any case. Remember the syntax error could also be after the caret position, so everything before that is valid and needs to be considered (e.g. for code completion).

Code completion is overall a minority case. And we're talking a minute fraction case here - maybe dozens of developers compared to many thousands. It may be the majority case for what you are working on, but I do not find such a limited case to be substantially motivating to detract from the type safety being provided for the majority case. You will need to use the try* methods in many cases, but most ANTLR-based applications will not need them at all.

The index 0 is defined by the code generator.

The index is always 0.

OK @sharwell, thinking over @mike-lischke's generated-code example has me convinced that sometimes it's better to lie to the type system.

I agree that Mike's code could be simplified with an alternate approach. However, as a minority case overall I do not believe it outweighs the benefits the current implementation provide for the majority case.

sharwell avatar May 21 '17 08:05 sharwell

@BurtHarris. I fully support your thoughts here, however how can you return undefined when you haven't declared a return value of T | undefined? I just tried. bit TS won't let your return undefined in that case.

mike-lischke avatar May 21 '17 08:05 mike-lischke

@sharwell, How's not throwing an exception corrupting the type safety in antlr4ts? You also return undefined for optional elements, without having trouble with type safety. I agree you shouldn't optimize for the special cases, but always for the general case (majority use), though I don't consider this here as special case (ignore the connection to code completion for a moment). It's s a general problem that antlr4ts uses a different approach than any other ANTLR4 target, for no good reason.

mike-lischke avatar May 21 '17 08:05 mike-lischke

How's not throwing an exception corrupting the type safety in antlr4ts?

Either the return value is allowed to be undefined, or it is not. There are three options when you cannot statically verify the inputs to a method producing such a value:

  1. Throw an exception
  2. Return a sentinel
  3. Break out of the type system, and return undefined even though it claimed not to

You also return undefined for optional elements, without having trouble with type safety.

Optional elements use a different return type in the generated code than required elements use. Unlike other targets, this target differentiates between optional and required elements in the generated code.

It's s a general problem that antlr4ts uses a different approach than any other ANTLR4 target

This is the only target+language combination capable of expressing this concept. It is equivalent to the other targets within the bounds of the expressiveness of the languages.

for no good reason

I worked on this target for the primary purpose of better understanding the capabilities/expressiveness of the TypeScript type system and the ways it relates to real-world applications. As such, succinctness of code interacting with generated parsers and certain conceptual differences relative to other language targets are currently secondary concerns for this particular target. My goal is to produce optimally usable code within the bounds of adhering to the limitations of the type system.

sharwell avatar May 21 '17 08:05 sharwell

OK, I see we obviously have different goals and we should stop endlessly arguing. I can just fork antlr4ts and change that to what I need/want, no problem. Thanks.

mike-lischke avatar May 21 '17 09:05 mike-lischke

@mike-lischke If our goals would be considered competing with one another, that would be a pretty substantial negative mark on the type system for the language. I am still interested in addressing the usability concerns that remain. There are two specific cases where things remain potentially clumsy for error handling scenarios. In each of these cases, the alternative to handle errors is quite a bit longer than the short form, but not difficult to determine (e.g. no need to look up indexes in the generated code).

  1. Accesses to a rule or token which appears exactly once in the rule for all paths. This is emitted as a TContext (single, non-optional). The equivalent code to get this as an optional value to handle errors is ctx.tryGetRuleContext(0, TContext), which is longer but trivial to determine from ctx.t().
  2. Accesses to a specific indexed element for a rule or token which can appear more than once. This would have the form ctx.t(i), but if you need to handle error scenarios it would be ctx.tryGetRuleContext(i, TContext).

Assuming that we cannot change ctx.t() or ctx.t(i), what would you like to write instead of the long form as you are working?

sharwell avatar May 21 '17 09:05 sharwell

@sharwell: This is really a matter of conflicting style not conflicting goals. Its unfortunate that Mike felt blocked, but I feel accountable for not working through this earlier. But in view of ts-flavor, the assumption above about not changing ctx.t() or ctx.t(i) tastes bad to me.

It all goes back to issue #8. We began the job with PR #266, #268, but #8 is really not complete. PR #270 didn't extend the pattern of using properties rather than methods. I think client code should be using property ctx.t rather any method(s). The idea of using different methods for syntax-safe and syntax-unsafe situations seems really distasteful to me. Instead I think we can have different declarations of identically named properties on tool generated interfaces appropriate for to those two different situations. That way code can use conventional JavaScript shortcuts like if (ctx.t) ... or (ctx.t || {}).foo in either case, but is only forced to when needed. Since TypeScript 2, ctx.t!.foo can be used to cheat too.

For discussion purposes lets call these ISafe<R>andIUnsafe<R>` (no implication they need to be generics.) Assuming:

grammar Json;
pair
    : name=JString ':' value
    ;
obj
   : '{' pair ( ',' pair )* '}'   
   | '{' '}'                   
   ;
...

Isn't this close to what we want to be generating...

export interface IUnsafePair { 
    name: Token | undefined;
    value: IUnsafeValue | undefined;
}
export interface ISafePair extends IUnsafePair {
    name: Token;
    value: ISafeValue;
}

// potentially non-exported implementation details...
class PairContext implements IUnsafePair, ISafePair {
	name: Token;
	_value?: IUnsafeValue;
	get value() { return this._value!}
}

export interface IUnsafeObj {
    pair: Array<IUnsafePair>;
}
export interface ISafeObj extends IUnsafeObj {
   pair: Array<ISafePair>;
}

How does ErrorNode enter into this?

BurtHarris avatar May 21 '17 22:05 BurtHarris

@mike-lischke I understand re forking, and wanted to point out that the npm link and bundle features may help make this less painful and more easily reversible. Personally, I'd like to see your fork. I've already used link in the build scripts to deal with an ugly build-order issue and enable testing on a fork, but not used bundle (yet). Your work is advanced enough it might justify looking into it.

BurtHarris avatar May 21 '17 23:05 BurtHarris

No worries, I'm just tired to discuss this issue.

mike-lischke avatar May 22 '17 06:05 mike-lischke

@mike-lischke @BurtHarris I filed antlr/antlr4#1972 to really truly fix this the right way. Once the new error recovery strategy is implemented, 1) the tryGet* methods can all go away, 2) the accessors can be used in visitors for both correct and incorrect code, and 3) the methods can continue throwing exceptions when used incorrectly.

This will be a milestone achievement for ANTLR's error recovery IMO, and likely never would have been possible without all the opposing pressure on both sides in the discussion above.

sharwell avatar Jul 26 '17 15:07 sharwell