rust-analyzer
rust-analyzer copied to clipboard
Tuple struct field is parsed as float
use std::collections::HashMap;
pub struct Graph(HashMap<i32, Vec<i32>>);
impl Graph {
pub fn new() -> Self {
Graph(HashMap::new())
}
pub fn add_edge(&mut self, u: i32, v: i32) {
self.0.<|>
}
}
In the case of
pub struct S;
impl S {
pub fn blah(&self) {}
}
struct T(S);
impl T {
fn foo(&self) {
self.0.<|>
}
}
It looks like "0." is treated as a LITERAL:
L_CURLY@[5619; 5620) "{"
WHITESPACE@[5620; 5630) "\r\n "
EXPR_STMT@[5630; 5635)
FIELD_EXPR@[5630; 5635)
PATH_EXPR@[5630; 5634)
PATH@[5630; 5634)
PATH_SEGMENT@[5630; 5634)
SELF_KW@[5630; 5634) "self"
DOT@[5634; 5635) "."
err: `expected field name or number`
err: `expected SEMI`
LITERAL@[5635; 5637)
FLOAT_NUMBER@[5635; 5637) "0."
WHITESPACE@[5637; 5643) "\r\n "
R_CURLY@[5643; 5644) "}"
Shouldn't struct T(S); be a TupleStruct? I don't see much about them in our code.
That is expected, b/c 0. is a float literal. Note, however, that completion inserts a dummy identifier (see how CompletionCtx is constructed), so that shouldn't be a problem.
Shouldn't struct T(S); be a TupleStruct? I don't see much about them in our code.
I haven't looked at this at all, it might be the case that "nothing works b/c nothing is supported", but it shouldb't be too hard to fix. I suggest just tracing back from the completion to see where we get wrong results.
In principle, this should work, but there are no tests for typing tuple_struct.0 correctly, so it might actually be broken for some reason (I haven't investigated further so far)...
The CompletionContext.dot_receiver isn't being filled. It looks like we aren't passing the test at https://github.com/rust-analyzer/rust-analyzer/blob/42a883f06c28ddeab22e5703a578f19110dde7f3/crates/ra_ide_api/src/completion/completion_context.rs#L214
As an aside I couldn't debug this at all under windows and my vm likes to churn at 100% disk usage :D
I have opened #1117 to add inference support to tuple struct indexing. However, this doesn't fix the issue as the lexer still finds a float literal in the autocompletion case and I don't know how to recover and parse it as int + dot.
So the problem here is that we can't properly analyze incomplete file due to stupid lexer :) The tree for
{ self.0.<|> }
is
BLOCK@[198; 221)
L_CURLY@[198; 199) "{"
WHITESPACE@[199; 208) "\n "
FIELD_EXPR@[208; 215)
PATH_EXPR@[208; 212)
PATH@[208; 212)
PATH_SEGMENT@[208; 212)
SELF_KW@[208; 212) "self"
DOT@[212; 213) "."
err: `Tuple (struct) field access is only allowed through decimal integers with no underscores or suffix`
FLOAT_NUMBER@[213; 215) "0."
WHITESPACE@[215; 220) "\n "
R_CURLY@[220; 221) "}"
Note that we have FIELD_EXPR for self., but no expr for self.0. I think the cleanest fix here is to just special-case HIR construction: if field expr is missing a field, and the immediate next token is a float literal which ends in ., we should fix this up to field access expression. The relevant code is here:
https://github.com/rust-analyzer/rust-analyzer/blob/d55f1136d6444b1f50b9092c36a976d0e1c26202/crates/ra_hir/src/expr.rs#L690
We probably will need to patch completion context as well:
https://github.com/rust-analyzer/rust-analyzer/blob/d55f1136d6444b1f50b9092c36a976d0e1c26202/crates/ra_ide_api/src/completion/completion_context.rs#L203
Question: why don't we fix the parser/lexer instead?
It seems like this require the parser to know the text of the tokens, or for the lexer to know the preceding token, both of which seem less than ideal. But maybe I am missing some clean way to fix this?
for the lexer to know the preceding token
We do not need to know a whole token, we only need to know prev char is a "DOT", such that we only need to add a prev function in lex::Ptr ? And change scan_number to handle that case ?
edit: grammer and typo
a single char wouldn't be enough: 0.0..0.9 should parse as range. In general, I'd avoid adding lookahead/lookbehind to the lexer
Triage: completion works here, but the expression is still parsed as a float.

Here's a simple reproduction. This infers fld as (i32,), but it should be i32:
fn main() {
let fld = ((0,),).0.0;
}
Can I try to fix this one? :eyes:
Sure :-).
I don't think this is going to be properly fixable without fixing it at the parser/lexer stage. Consider the way tup.0.0 is currently lexed:
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected] "tup"
[email protected] "."
[email protected] "0.0"
Because 0.0 is its own token, this leaves no possibility of referring to tup.0 as a syntax node by itself (even if the parse tree looked different), which means that it would be an expression with no corresponding source node, which would mean that IDE features wouldn't work on it (extend selection and hover selection come to mind, but others would probably break too).
I think that leaves 2 options:
Option A – change the lexer: I've played with the idea of changing how floats are lexed at the rustc_lexer stage, they could be split up into a "pre-decimal" part, the ., and a "post-decimal" part, and then the parser could reassemble them when parsing a literal. This might cause problems with proc macros though, AFAIK they get direct access to the lexer output. Perhaps we could split up float tokens coming from rustc_lexer when converting them to our input token type though.
This makes the lexer output look somewhat "ugly" – why would floats be multiple tokens? –, but from a compiler engineering perspective seems like the more "correct" way of doing things as it avoids lexer hacks.
Option B – Put a 'lexer hack' in the parser: Another alternative would be to allow the parser to modify its input tokens while it's parsing them. Presumably we could tuck away the code that does this somewhere, to avoid having the parser "know" token contents, and just have it call try_split_float_into_dot_separated_ints. Sadly this doesn't seem to be immediately doable either, because the parser just reports events to the caller, such as "consume N raw tokens and build a node of type X" – where the whole float literal corresponds to a single "raw token" that was provided by the caller, so we can't tell it "consume half of this token". Also, this would need changes to Input to make all users of the parser provide the text in each token.
rustc and syn both implement option B, because they are both lossy one-way parsers and that's simplest for them. I'm not quite sure which solution would be architecturally simpler for us but would appreciate some input, because I don't like this bug and would like to get it fixed :)
Sadly the fix caused a lot of downstream bugs, so I've reverted it in https://github.com/rust-lang/rust-analyzer/pull/12241 for now
Another old related issue https://github.com/rust-lang/rust-analyzer/issues/5429
I've implemented an even more hacky "Option B" from above here: https://github.com/cynecx/rust-analyzer/commit/cc7ba822ae6062f463acb4cd954ddb97e9e043a0. Fair warning: It is far from being upstreamable or PR worthy, but works if you really need it.
Hmm, looking at this, I wonder if we can just encode the split as an extra event with option B? The main question is how to encode the split position (or if we have to just split on demand later)
https://github.com/rust-lang/rust-analyzer/pull/14084 fixes this mostly. We still incorrectly parse it if it's passed as a macro input where an expression fragment is expected.