rust-analyzer Tuple struct field is parsed as float

trafficstars

use std::collections::HashMap;

pub struct Graph(HashMap<i32, Vec<i32>>);

impl Graph {
    pub fn new() -> Self {
        Graph(HashMap::new())
    }
    pub fn add_edge(&mut self, u: i32, v: i32) {
        self.0.<|>
    }
}

Apr 04 '19 11:04 sigoden

In the case of

pub struct S;
impl S {
    pub fn blah(&self) {}
}

struct T(S);

impl T {
    fn foo(&self) {
        self.0.<|>
    }
}

It looks like "0." is treated as a LITERAL:

L_CURLY@[5619; 5620) "{"
    WHITESPACE@[5620; 5630) "\r\n        "
    EXPR_STMT@[5630; 5635)
      FIELD_EXPR@[5630; 5635)
        PATH_EXPR@[5630; 5634)
          PATH@[5630; 5634)
            PATH_SEGMENT@[5630; 5634)
              SELF_KW@[5630; 5634) "self"
        DOT@[5634; 5635) "."
        err: `expected field name or number`
        err: `expected SEMI`
    LITERAL@[5635; 5637)
      FLOAT_NUMBER@[5635; 5637) "0."
    WHITESPACE@[5637; 5643) "\r\n    "
    R_CURLY@[5643; 5644) "}"

Apr 05 '19 15:04 kjeremy

Shouldn't struct T(S); be a TupleStruct? I don't see much about them in our code.

Apr 05 '19 15:04 kjeremy

That is expected, b/c 0. is a float literal. Note, however, that completion inserts a dummy identifier (see how CompletionCtx is constructed), so that shouldn't be a problem.

Shouldn't struct T(S); be a TupleStruct? I don't see much about them in our code.

I haven't looked at this at all, it might be the case that "nothing works b/c nothing is supported", but it shouldb't be too hard to fix. I suggest just tracing back from the completion to see where we get wrong results.

Apr 05 '19 17:04 matklad

In principle, this should work, but there are no tests for typing tuple_struct.0 correctly, so it might actually be broken for some reason (I haven't investigated further so far)...

Apr 05 '19 18:04 flodiebold

The CompletionContext.dot_receiver isn't being filled. It looks like we aren't passing the test at https://github.com/rust-analyzer/rust-analyzer/blob/42a883f06c28ddeab22e5703a578f19110dde7f3/crates/ra_ide_api/src/completion/completion_context.rs#L214

As an aside I couldn't debug this at all under windows and my vm likes to churn at 100% disk usage :D

Apr 05 '19 20:04 kjeremy

I have opened #1117 to add inference support to tuple struct indexing. However, this doesn't fix the issue as the lexer still finds a float literal in the autocompletion case and I don't know how to recover and parse it as int + dot.

Apr 05 '19 21:04 robojumper

So the problem here is that we can't properly analyze incomplete file due to stupid lexer :) The tree for

{ self.0.<|> }

is

BLOCK@[198; 221)
  L_CURLY@[198; 199) "{"
  WHITESPACE@[199; 208) "\n        "
  FIELD_EXPR@[208; 215)
    PATH_EXPR@[208; 212)
      PATH@[208; 212)
        PATH_SEGMENT@[208; 212)
          SELF_KW@[208; 212) "self"
    DOT@[212; 213) "."
    err: `Tuple (struct) field access is only allowed through decimal integers with no underscores or suffix`
    FLOAT_NUMBER@[213; 215) "0."
  WHITESPACE@[215; 220) "\n    "
  R_CURLY@[220; 221) "}"

Note that we have FIELD_EXPR for self., but no expr for self.0. I think the cleanest fix here is to just special-case HIR construction: if field expr is missing a field, and the immediate next token is a float literal which ends in ., we should fix this up to field access expression. The relevant code is here:

https://github.com/rust-analyzer/rust-analyzer/blob/d55f1136d6444b1f50b9092c36a976d0e1c26202/crates/ra_hir/src/expr.rs#L690

We probably will need to patch completion context as well:

https://github.com/rust-analyzer/rust-analyzer/blob/d55f1136d6444b1f50b9092c36a976d0e1c26202/crates/ra_ide_api/src/completion/completion_context.rs#L203

Question: why don't we fix the parser/lexer instead?

It seems like this require the parser to know the text of the tokens, or for the lexer to know the preceding token, both of which seem less than ideal. But maybe I am missing some clean way to fix this?

Apr 18 '19 18:04 matklad

for the lexer to know the preceding token

We do not need to know a whole token, we only need to know prev char is a "DOT", such that we only need to add a prev function in lex::Ptr ? And change scan_number to handle that case ?

edit: grammer and typo

Apr 18 '19 18:04 edwin0cheng

a single char wouldn't be enough: 0.0..0.9 should parse as range. In general, I'd avoid adding lookahead/lookbehind to the lexer

Apr 18 '19 18:04 matklad

Triage: completion works here, but the expression is still parsed as a float.

Jul 15 '20 16:07 lnicola

Here's a simple reproduction. This infers fld as (i32,), but it should be i32:

fn main() {
    let fld = ((0,),).0.0;
}

Mar 19 '21 01:03 jonas-schievink

Can I try to fix this one? :eyes:

Jul 30 '21 02:07 alexfertel

Sure :-).

Jul 30 '21 09:07 lnicola

I don't think this is going to be properly fixable without fixing it at the parser/lexer stage. Consider the way tup.0.0 is currently lexed:

        [email protected]
          [email protected]
            [email protected]
              [email protected]
                [email protected]
                  [email protected] "tup"
          [email protected] "."
          [email protected] "0.0"

Because 0.0 is its own token, this leaves no possibility of referring to tup.0 as a syntax node by itself (even if the parse tree looked different), which means that it would be an expression with no corresponding source node, which would mean that IDE features wouldn't work on it (extend selection and hover selection come to mind, but others would probably break too).

I think that leaves 2 options:

Option A – change the lexer: I've played with the idea of changing how floats are lexed at the rustc_lexer stage, they could be split up into a "pre-decimal" part, the ., and a "post-decimal" part, and then the parser could reassemble them when parsing a literal. This might cause problems with proc macros though, AFAIK they get direct access to the lexer output. Perhaps we could split up float tokens coming from rustc_lexer when converting them to our input token type though.

This makes the lexer output look somewhat "ugly" – why would floats be multiple tokens? –, but from a compiler engineering perspective seems like the more "correct" way of doing things as it avoids lexer hacks.

Option B – Put a 'lexer hack' in the parser: Another alternative would be to allow the parser to modify its input tokens while it's parsing them. Presumably we could tuck away the code that does this somewhere, to avoid having the parser "know" token contents, and just have it call try_split_float_into_dot_separated_ints. Sadly this doesn't seem to be immediately doable either, because the parser just reports events to the caller, such as "consume N raw tokens and build a node of type X" – where the whole float literal corresponds to a single "raw token" that was provided by the caller, so we can't tell it "consume half of this token". Also, this would need changes to Input to make all users of the parser provide the text in each token.

rustc and syn both implement option B, because they are both lossy one-way parsers and that's simplest for them. I'm not quite sure which solution would be architecturally simpler for us but would appreciate some input, because I don't like this bug and would like to get it fixed :)

Apr 06 '22 15:04 jonas-schievink

Sadly the fix caused a lot of downstream bugs, so I've reverted it in https://github.com/rust-lang/rust-analyzer/pull/12241 for now

May 13 '22 15:05 jonas-schievink

Another old related issue https://github.com/rust-lang/rust-analyzer/issues/5429

May 25 '22 12:05 Veykril

I've implemented an even more hacky "Option B" from above here: https://github.com/cynecx/rust-analyzer/commit/cc7ba822ae6062f463acb4cd954ddb97e9e043a0. Fair warning: It is far from being upstreamable or PR worthy, but works if you really need it.

Sep 25 '22 18:09 cynecx

Hmm, looking at this, I wonder if we can just encode the split as an extra event with option B? The main question is how to encode the split position (or if we have to just split on demand later)

Sep 26 '22 07:09 Veykril

https://github.com/rust-lang/rust-analyzer/pull/14084 fixes this mostly. We still incorrectly parse it if it's passed as a macro input where an expression fragment is expected.

Feb 08 '23 08:02 Veykril

rust-analyzer rust-analyzer copied to clipboard

Tuple struct field is parsed as float

rust-analyzer
rust-analyzer copied to clipboard