tree-sitter-c-sharp icon indicating copy to clipboard operation
tree-sitter-c-sharp copied to clipboard

Allow contextual keywords to be used as identifiers

Open tamasvajk opened this issue 2 years ago • 4 comments

This is a tracking issue to see the progress on validating which contextual keywords can be used as identifiers. All contextual keywords are listed below from https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/#contextual-keywords.

Tree sitter is performing context aware lexing, so some of the keywords can already be used as identifiers even without listing them in _contextual_keywords. Some of them need special handling, and some of them are never going to be correctly handled. Examples for the latter:

  • var in implicit_type could also be parsed as a named type.
  • nint, nuint in predefined_type could also be parsed as a named type.
  • notnull in type_parameter_constraint can be parsed as a notnull constraint or a type constraint.
  • ... In these cases we should choose the most likely option as the preferred parse rule.

Keywords:

  • [x] add
    • https://github.com/tree-sitter/tree-sitter-c-sharp/pull/279
  • [ ] alias
  • [ ] and
  • [x] args - currently treated as any other identifier. Similar to value.
  • [ ] ascending
  • [ ] async
  • [ ] await
  • [ ] by
  • [ ] descending
  • [x] dynamic
    • https://github.com/tree-sitter/tree-sitter-c-sharp/pull/281
  • [ ] equals
  • [x] file
    • https://github.com/tree-sitter/tree-sitter-c-sharp/pull/277
  • [ ] from
  • [x] get
    • https://github.com/tree-sitter/tree-sitter-c-sharp/pull/279
  • [ ] global
  • [ ] group
  • [x] init
    • https://github.com/tree-sitter/tree-sitter-c-sharp/pull/279
  • [ ] into
  • [ ] join
  • [ ] let
  • [ ] managed
  • [x] nameof
    • https://github.com/tree-sitter/tree-sitter-c-sharp/pull/281
    • https://github.com/tree-sitter/tree-sitter-c-sharp/pull/290
  • [ ] nint
  • [ ] not
  • [ ] notnull
  • [ ] nuint
  • [ ] on
  • [ ] or
  • [ ] orderby
  • [ ] partial
  • [ ] record
  • [x] remove
    • https://github.com/tree-sitter/tree-sitter-c-sharp/pull/279
  • [ ] required
  • [x] scoped
    • https://github.com/tree-sitter/tree-sitter-c-sharp/pull/275
  • [ ] select
  • [x] set
    • https://github.com/tree-sitter/tree-sitter-c-sharp/pull/279
  • [ ] unmanaged
  • [x] value - currently treated as any other identifier. Syntax highlighting could be improved to highlight it as @keyword inside init/set accessors.
  • [x] var
    • https://github.com/tree-sitter/tree-sitter-c-sharp/issues/163
    • https://github.com/tree-sitter/tree-sitter-c-sharp/pull/280
  • [ ] when
  • [ ] where
  • [ ] with
  • [ ] yield

tamasvajk avatar Jan 11 '23 10:01 tamasvajk

Do you think it is likely that each keyword is going to require additional rules and scenarios much like scoped has?

My concern here is how much this is going to blow the grammar up in terms of size/complexity. I think we are already the biggest tree-sitter grammar and if the PR for the 'scoped' rule is anything to go by this could really cause a big expansion not just in terms of maintaining it but also in terms of conflict rules and execution overhead/parser size (e.g. the current LFS warning)

This is going to come down I guess to the trade-off between being very strict/accurate vs being fast/lightweight. Given that Roslyn exists for the former there was definitely a push from Max to be the latter especially given the current usage of tree-sitter to provide fast syntatical analysis over a huge amount of code at GitHub for the semantic feature.

Worth a discussion.

damieng avatar Jan 11 '23 12:01 damieng

I don't have a definitive answer to this. I hoped that most of these wouldn't even be required in _contextual_keywords, but that doesn't seem to be the case so far.

These are the ones that I already looked at and might give some rough idea on the changes needed:

  • scoped: as mentioned above, needs additional rules
  • partial, required: probably doesn't need additional rules (based on this commit)
  • yield: probably doesn't need additional rules (based on this commit)
  • var, nint, nuint: needs additional rules (see here)
  • async: needs additional rules (see here)

Some of these "contextual keyword as identifier" problems seem more important than others. For example file should be allowed as a parameter name and as an identifier anywhere a local variable can show up. (I'm not sure if this is already the case.) var, group, from, set, value are somewhat similar, but probably less frequent.

I don't know anything about the performance impact of the recent changes that have been introduced. I guess these can be two fold: (i) compiling the parser might take longer, (ii) the performance of the parser degrades. How much do we care about (i)? Should we add some tests for (ii)?

Regarding grammar size and complexity: I agree with you, I think it's already too big. (And somewhat fragile) At the same time, we're still mixing up generic types and expressions (mentioned in https://github.com/tree-sitter/tree-sitter-c-sharp/issues/176), which somewhat saddens me as even a simple syntax highlighting task would be inaccurate. And fixing it will probably make the grammar even more complex.

tamasvajk avatar Jan 11 '23 13:01 tamasvajk

I continued looking into this a bit more:

  • file didn't need any changes https://github.com/tree-sitter/tree-sitter-c-sharp/pull/277
  • get, set, ... accessor keywords could be removed from the list, and also from the conflicts table. https://github.com/tree-sitter/tree-sitter-c-sharp/pull/279
  • value is not a dedicated token in the grammar, so it needed no change.
  • dynamic and nameof were only present in _contextual_keywords, so they could be removed without any other changes - https://github.com/tree-sitter/tree-sitter-c-sharp/pull/281

tamasvajk avatar Jan 12 '23 12:01 tamasvajk

I just came across this tweet, which shows a great test case for contextual keywords: Sharplab link.

The interesting bits:

class where { }
    
class String {
    public static implicit operator String(dynamic var) => null;
}

class dynamic { }
public class var { 

    dynamic dynamic => null;
          
    void record(where String = null, int y = default)
    {
        String s = this.dynamic;
    }    
         
    async Task<int> M() {
        var v = new();
        await v.async(new async());
        return 0;
    }
    
    async async async(async async) => 
        await async;
}

class await : INotifyCompletion {
  ...
}

class async { 
    public await GetAwaiter() => throw null;
}

tamasvajk avatar Jan 13 '23 23:01 tamasvajk