Diagnostics are too chatty
I've started implementing a linter that will eventually power the LS. For now all it does is use the diagnostic messages from JuliaSyntax. https://github.com/davidanthoff/StringBuilders.jl/pull/96/files is an example. There is one syntax error, a missing ) on line 14 and the error message for that is nice.
But then all the other error messages after it I feel ideally would not appear at all. In my mind once the parser recognizes that the error on line 14, it should continue parsing pretending that the error was fixed and then not report all these knock-off errors further down in the file.
This is actually probably a reason to not surface the diagnostics in the VS Code extension yet. CSTParser doesn't have a nice error message, but it just stops processing the file after this first error, which at the end of the day is probably a better user experience because the error list is not populated by lots of false-positive like things.
FYI @pfitzseb
You can chose what to propagate, or? Analogous to https://github.com/JuliaLang/JuliaSyntax.jl/pull/344
That is true, we can easily go back to the CSTParser situation by just reporting the first error. Having said that, it would be nice if the error recovery here would be better :)
https://github.com/dotnet/roslyn/issues/16151#issuecomment-269741300 is interesting. I would imagine that in the case I linked to, we detect that a closing ) is missing, and then bubble up one level and continue parsing normally, as if that ) was there, right?
I started to surface JuliaSyntax.jl errors and warnings in the VS Code extension now (and also in a new GitHub lint action). And I'd really like to not just show the first diagnostics. If there are multiple errors in a file, it is really useful to show them all, so I think for now I'm going to err on the side of showing too many errors, hoping that we can gradually improve the error recovery here in JuliaSyntax.
Yeah, we don't recover well from this kind of thing. Hence #344 :-(
Recovery is a whole thing I haven't worked on. Partly because I don't believe it's possible to do it well with a big hand-written pile of heuristics. At least, not without a heroic amount of engineering effort.
I'd like to do it in a data-driven way instead because I think that's a good long term solution. ("Data driven" == a user-contributed database of common errors and their fixes along with human-readable explanations. Then use that in conjunction with a pluggable recovery system - initially perhaps a hand-written pattern matcher, but I think a learned model is the ultimate solution to this.)
Take everything that follows with a grain of salt because I'm still trying to understand JuliaSyntax and might mix things up :) But here we go.
If I understand what is happening in this case of a missing closing parenthesis for a function, then https://github.com/JuliaLang/JuliaSyntax.jl/blob/f99b76f0108b467c06e806baea52096b4b35035f/src/parser.jl#L144 handles that. And the logic seems to be that it if there isn't a ) where it expects that, then it just keeps looking for the next ) and marks everything in between as an error node. Is that right?
If that is so, then the issue here doesn't seem to be a pattern matching system that sits on top of the nodes, but instead I think the actual nodes that are generated would have to look differently and ideally follow the strategy that is described in the C#/Roslyn issue I linked to above. If they don't find a closing parenthesis, the first thing they try is see whether the next token can be successfully parsed by the "context" above the current one, and if so continue in that way. So here the context that is missing the ) is the function arguments, and the context above would be the statements in the function body, and so Roslyn would just check whether the next thing that isn't the closing ) is a valid statement (presumably), and if so, contineu with that.
I'm not sure how that would be implemented here, though, as I looks to me as if inside bump_closing_token there is no info what the next higher context would be, so it seems hard to test whether the next token would be valid there, right?
And the logic seems to be that it if there isn't a ) where it expects that, then it just keeps looking for the next ) and marks everything in between as an error node. Is that right?
Yes, it's pretty much the simplest possible recovery algorithm. It's not good :laughing:
I'm not sure how that would be implemented here, though, as I looks to me as if inside bump_closing_token there is no info what the next higher context would be, so it seems hard to test whether the next token would be valid there, right?
Right. My understanding is that we'd have to manually push onto a stack of parent contexts as we recursively call the parsing functions. Essentially we're keeping a "shadow stack" which mirrors the actual program stack but is accessible at the source level.
It's not the worst thing ever and maybe could even be automated with a macro to wrap a push/pop around every parse_* function call body. But it's also not lovely. IIRC the rust-analyzer parser also does this kind of thing. And it seems like a reasonable amount of engineering effort to get something a lot better than what we have now.
I still don't think that kind of heuristic can ever be "really great", but if it's fairly easy it could be a worthwhile win.
https://github.com/dotnet/roslyn/issues/16151#issuecomment-269741300 is interesting. I would imagine that in the case I linked to, we detect that a closing ) is missing, and then bubble up one level and continue parsing normally, as if that ) was there, right?
Btw if someone has the energy to make an experimental prototype of this and demonstrate what it could give us - and what the cost is in code complexity - that would be amazing. Doesn't have to be fancy - just a demo to give a taste.
I'm really busy right now so I can't write the code but I could look at a prototype.