JuliaSyntax.jl
JuliaSyntax.jl copied to clipboard
Error recovery for unexpected continuation keywords
Consider
julia> JuliaSyntax.parse(JuliaSyntax.GreenNode, "if true; x ? true : elseif true end")[1]
1:35 │[toplevel]
1:35 │ [if]
1:2 │ if
3:3 │ Whitespace
4:7 │ true ✔
8:26 │ [block]
8:8 │ ;
9:26 │ [if]
9:9 │ Whitespace
10:10 │ Identifier ✔
11:11 │ Whitespace
12:12 │ ?
13:13 │ Whitespace
14:17 │ true ✔
18:18 │ Whitespace
19:19 │ :
20:20 │ Whitespace
21:26 │ [error] ✘
21:26 │ elseif ✔
27:27 │ Whitespace
28:31 │ [error] ✘
28:31 │ true ✔
32:32 │ Whitespace
33:35 │ end
This special case is fixed by #77 by punting the elseif into the containing block instead:
julia> JuliaSyntax.parse(JuliaSyntax.GreenNode, "if true; x ? true : elseif true end")[1]
1:35 │[toplevel]
1:35 │ [if]
1:2 │ if
3:3 │ Whitespace
4:7 │ true ✔
8:19 │ [block]
8:8 │ ;
9:19 │ [if]
9:9 │ Whitespace
10:10 │ Identifier ✔
11:11 │ Whitespace
12:12 │ ?
13:13 │ Whitespace
14:17 │ true ✔
18:18 │ Whitespace
19:19 │ :
20:19 │ error ✘
20:20 │ Whitespace
21:32 │ [elseif]
21:26 │ elseif
27:27 │ Whitespace
28:31 │ true ✔
32:32 │ [block]
32:32 │ Whitespace
33:35 │ end
but of course that naive solution only works if there is only one missing or extraneous token, so "if true; x ? true : foo ))))) elseif true end" will break it again.
Generally, this should be solvable by an arbitrarily long look-ahead for continuation keywords, but I really don't like that solution (and it might not even work in all cases).
Generally, this should be solvable by an arbitrarily long look-ahead for continuation keywords
If we cap the lookahead at some large but not huge value I feel this is acceptable. But recovery and error reporting really is hard: in general it requires both look ahead and look behind to do a good job.
Any parsing algorithm which assumes the source is "well-formed by default" is likely to enter weird states where there's not enough local information available in the program to emit the ideal error. Hence wanting #93 or something similar... (really, I believe parsing of broken text and emitting error messages and warnings are in the "big pile of heuristics" category of software which in general is better learned with a data driven ML approach... but that would be a research project ...)