kotlin
kotlin copied to clipboard
KT-8263: Conditional operators are not parsed correctly
The parsing of type argument lists now differentiates depending on the grammatical environments they appear in. There are two environments: strict and mixed. In strict environments there cannot be any grammar interactions with comparison expressions, whereas in mixed environments such interactions need to be accounted for. Strict environments can be found pretty much everywhere except in blocks/lambdas. Examples for strict environments are type references in properties or value parameter list of functions. Inside of blocks/lambdas the mixed environment allows for interactions between call expressions such as mutableListOf<Here>()
and comparison such as some(x < y, 0 > (y + z))
.
It is crucial, that the parser always produces a type argument list, if there is a grammatically correct call expression, even in cases where it could also be parsed as a grammatically correct comparison. The call expression takes precedence over the comparison, as a call expression cannot be written in any other way, unlike comparisons, which can easily be expressed differently. On the other hand, a type argument list must not be produced, if there is a grammatically incorrect function call, but a grammatically correct comparison. In that case, a comparison has to be produced. If a section is neither a grammatically correct call expression nor comparison, then the parser is free to chose whatever option it assumes fits best. The first two cases are required for correctness. The third case is important for helpful error highlighting.
some(x < y, 0 > (y + z)) // comparison
some(x < y, x > (y + z)) // function call
some(x < y, (y + z) < x) // rewritten comparison
some(x < y, () > (y + z)) // neither
Algorithm
Whether the parser produces a type argument list depends on the following conditions.
KotlinParsing.tryParseTypeArgumentList(boolean)
parses a section and assigns it one of three kinds:
-
TYPE_ARGUMENT_LIST
if the environment is strict, the type argument list is empty<>
or if the type argument list did not contain any syntactical errors. This case implies, that the section is most likely a type argument list. -
FAULTY_TYPE_ARGUMENT_LIST
if none of the above conditions apply and the type argument list has a GT at the end, meaning that it is at least complete<...>
. This case implies, that the section could be type argument list, that simply contains errors. But it could also be some sort of comparison. Further checks have to be done in this case. -
NONE
if none of the above conditions apply. As the name implies, the section cannot be considered a type argument list.
KotlinExpressionParsing.parseCallSuffixTypeArgumentList()
decides based upon this determined kind and type argument list indicators (see below) whether a type argument list should be produced or not.
- If the kind is
NONE
, no type argument list will be produced. - If the kind is
FAULTY_TYPE_ARGUMENT_LIST
the parser has to check whether the section could also be correct comparison. In order to do this, it rolls back the type argument list and tries to parse the section as the right-hand side of a comparison. If there are no syntax errors, no type argument list will be produced. Otherwise the right-hand side of the comparison will be rolled back, parsed as a type argument list once more and a type argument list will be produced. This whole check is only done, if the section is followed by a strong type argument list indicator. - If the kind is
TYPE_ARGUMENT_LIST
and if the sections is followed by any type argument list indicator, a type argument list will be produced.
Type Argument List Indicators
In order to decide whether a type argument list should be produced, the tokens after the section have to be inspected. There are a number of indicators, that either clearly state the necessity of the production of a type argument list or at least hint towards the intention of such existence. The former indicators are referred to as strong, whereas the latter are considered weak.
Strong Type Argument List Indicators
In order of precedence:
- The start of a value argument list or an annotated lambda indicates an intended type argument list.
<...>(
<...>{
<...>@identifier{
<...>identifier@{
- Postfix expressions are usually not applied to non-null boolean comparisons.
<...>::
<...>?
<...>.
<...>?.
<...>!!
<...>++
<...>--
Weak Type Argument List Indicators
- Line breaks are only available outside of value argument list. This is an incomplete line of code.
<...>↵
- Other limiting tokens are also a likely indicator of an indented function call.
<...>)
<...>]
<...>}
<...>,
<...>:
<...>=
- Operators are unlikely to be applied to non-null booleans.
<...>==
<...>!=
<...>===
<...>!==
<...> as
<...> is
<...> in
- Hard keywords hint towards some formatting issue where line breaks are missing
<...>fun
<...>var
<...>var
<...>object
Coverage
The current approach covers all cases when it comes to correctness. If the code compiles without errors, then the parser will decide correctly. It also covers a significant part of the error spectrum, aiding the developer in understanding their mistakes.
Other
KT-35811 as covered in #4771 is re-included in this PR, because of the previous rollback in relation with KT-8263.
I had to update the approach to cover to the error cases. This far, any error in the type argument list would result in it being dropped. Therefor error highlighting and autocompletion would have been very spotty while filling out the type argument list of call expression. Writing calls like mutableListOf<()->>()
, where the return type of the function type hasn't been typed yet would not receive proper highlighting. The behaviour should now be significantly smoother with this adjusted approach.
The changes primarily focus on KotlinParsing.tryParseTypeArgumentList(boolean)
and KotlinExpressionParsing.parseCallSuffixTypeArgumentList()
. I updated the description in the PR accordingly.
As I was writing the test I discovered, that the parse could run into an endless loop due to the changes I made in order to aid recovery. In addition to that, I realised, that I require previous parts of the parser to properly mark the affected places when targeting older language versions. While I found several ways to make it work, it would have increased the scope of the change and the risk significantly. Therefor I decided to go with what you previously suggested and rolled back the improvements I made to the recovery. The parser now behaves relatively close to what it used to, again, except that the bug itself is fixed, of course. In the future it would be beneficial to improve the recovery, though. It has become slightly worse compared to what it was before the change.
The whole test situation was a bit unfortunate. There were a few tests that admittedly slipped past me, but there was also a change in the FIR compiler, that led to the rightful appearance of an UNRESOLVED_REFERENCE
error due to a bad interaction with the freshly added KtNodes.TYPE_ARGUMENT_LIST_LIKE_EXPRESSION
error node. The change was introduced after my push and before you first reported the test failures. Then, in the time in between my next push and your second attempt, that said change was rolled back or broken again and the rightful error was no longer being reported.
The good thing about this unfortunate series of events is, that we got hold of the bad interaction with the freshly introduced error node. The problem was as follows:
The TYPE_ARGUMENT_LIST_LIKE_EXPRESSION
error node sits in between the left-hand-side of a binary expression and the operator reference.
EXPRESSION TYPE_ARGUMENT_LIST_LIKE_EXPRESSION OPERATOR_REFERENCE EXPRESSION
The old frontend has no problem with it being there, because it only looks for KtExpression
elements and OPERATION_REFERENCE
nodes. However, the FIR is much more generous when it comes to accepting nodes. The ExpressionConverter.convertBinaryExpression
simply takes the rightmost node left to the operator reference as the left-hand-side, if it is an "expression" by FIR standards. The TYPE_ARGUMENT_LIST_LIKE_EXPRESSION
error node qualified as an expression, because it was of type KtNode
.
Now, I am not qualified to judge over what counts as an expression in FIR, so I made no change to the conversion. I instead changed the type of TYPE_ARGUMENT_LIST_LIKE_EXPRESSION
from KtNode
to IElementType
. I don't know, if there is a better type to be used here instead.
In the future that conversion probably has to improve. For instance, right now, it parses the right-hand-side as the rightmost (and NOT the leftmost) expression node to the right of the operator reference.
Now, it's almost done!
But unfortunately the test org.jetbrains.kotlin.test.runners.DiagnosticTestGenerated$Tests$TypeArguments.testForbidTypeArgumentListLikeExpressions
is failing
Hm, doesn't the Gradle run configuration "Analysis All Tests" execute all tests? It seems to not do that, at least not all of the time.
Anyway, I forgot to adjust the TypeArgumentListLikeExpressionsChecker
accordingly after I changed the type of the KtNodeTypes.TYPE_ARGUMENT_LIST_LIKE_EXPRESSION
.
I merged your changes manually and then unfortunately had to revert them because IDE plugin tests start failing :(
The tests are the following
It seems that after your changes, the code like HashMap<$$some-special-character-for-completion
starts being parsed as a condition instead of a type argument list that leads to redundant suggestions of functions instead of types.
Sorry, I don't have resources now for deeper investigation, I could only suggest to reproduce it via setuping Cooperative project locally.
No problem. I have already checked out the intellij repository and I'll take a look at the issue.
After understanding what Intellij does to perform the completion of incomplete statements, I adjusted the strong indicator list to contain TokenType.BAD_CHARACTER
. With this change the mentioned tests now run.
For context:
Intellij completes code like HashMap<String,
with HashMap<String, IntellijIdeaRulezzz>$
in an attempt to get it to compile. $
is considered a bad character everywhere except in strings and in front of identifiers (legacy syntax). Previously the bad character didn't matter, but now it must be considered as an indicator of a type argument list in order to conform to the existing assumptions.
Merged your changes manually, it seems like everything went fine. Thank you for your work and patience when fixing these complicated issues!