tree-sitter-c-sharp
tree-sitter-c-sharp copied to clipboard
Allow contextual keywords to be used as identifiers
This is a tracking issue to see the progress on validating which contextual keywords can be used as identifiers. All contextual keywords are listed below from https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/#contextual-keywords.
Tree sitter is performing context aware lexing, so some of the keywords can already be used as identifiers even without listing them in _contextual_keywords. Some of them need special handling, and some of them are never going to be correctly handled. Examples for the latter:
varinimplicit_typecould also be parsed as a named type.nint,nuintinpredefined_typecould also be parsed as a named type.notnullintype_parameter_constraintcan be parsed as anotnullconstraint or a type constraint.- ... In these cases we should choose the most likely option as the preferred parse rule.
Keywords:
- [x] add
- https://github.com/tree-sitter/tree-sitter-c-sharp/pull/279
- [ ] alias
- [ ] and
- [x] args - currently treated as any other identifier. Similar to
value. - [ ] ascending
- [ ] async
- [ ] await
- [ ] by
- [ ] descending
- [x] dynamic
- https://github.com/tree-sitter/tree-sitter-c-sharp/pull/281
- [ ] equals
- [x] file
- https://github.com/tree-sitter/tree-sitter-c-sharp/pull/277
- [ ] from
- [x] get
- https://github.com/tree-sitter/tree-sitter-c-sharp/pull/279
- [ ] global
- [ ] group
- [x] init
- https://github.com/tree-sitter/tree-sitter-c-sharp/pull/279
- [ ] into
- [ ] join
- [ ] let
- [ ] managed
- [x] nameof
- https://github.com/tree-sitter/tree-sitter-c-sharp/pull/281
- https://github.com/tree-sitter/tree-sitter-c-sharp/pull/290
- [ ] nint
- [ ] not
- [ ] notnull
- [ ] nuint
- [ ] on
- [ ] or
- [ ] orderby
- [ ] partial
- [ ] record
- [x] remove
- https://github.com/tree-sitter/tree-sitter-c-sharp/pull/279
- [ ] required
- [x] scoped
- https://github.com/tree-sitter/tree-sitter-c-sharp/pull/275
- [ ] select
- [x] set
- https://github.com/tree-sitter/tree-sitter-c-sharp/pull/279
- [ ] unmanaged
- [x] value - currently treated as any other identifier. Syntax highlighting could be improved to highlight it as
@keywordinsideinit/setaccessors. - [x] var
- https://github.com/tree-sitter/tree-sitter-c-sharp/issues/163
- https://github.com/tree-sitter/tree-sitter-c-sharp/pull/280
- [ ] when
- [ ] where
- [ ] with
- [ ] yield
Do you think it is likely that each keyword is going to require additional rules and scenarios much like scoped has?
My concern here is how much this is going to blow the grammar up in terms of size/complexity. I think we are already the biggest tree-sitter grammar and if the PR for the 'scoped' rule is anything to go by this could really cause a big expansion not just in terms of maintaining it but also in terms of conflict rules and execution overhead/parser size (e.g. the current LFS warning)
This is going to come down I guess to the trade-off between being very strict/accurate vs being fast/lightweight. Given that Roslyn exists for the former there was definitely a push from Max to be the latter especially given the current usage of tree-sitter to provide fast syntatical analysis over a huge amount of code at GitHub for the semantic feature.
Worth a discussion.
I don't have a definitive answer to this. I hoped that most of these wouldn't even be required in _contextual_keywords, but that doesn't seem to be the case so far.
These are the ones that I already looked at and might give some rough idea on the changes needed:
scoped: as mentioned above, needs additional rulespartial,required: probably doesn't need additional rules (based on this commit)yield: probably doesn't need additional rules (based on this commit)var,nint,nuint: needs additional rules (see here)async: needs additional rules (see here)
Some of these "contextual keyword as identifier" problems seem more important than others. For example file should be allowed as a parameter name and as an identifier anywhere a local variable can show up. (I'm not sure if this is already the case.) var, group, from, set, value are somewhat similar, but probably less frequent.
I don't know anything about the performance impact of the recent changes that have been introduced. I guess these can be two fold: (i) compiling the parser might take longer, (ii) the performance of the parser degrades. How much do we care about (i)? Should we add some tests for (ii)?
Regarding grammar size and complexity: I agree with you, I think it's already too big. (And somewhat fragile) At the same time, we're still mixing up generic types and expressions (mentioned in https://github.com/tree-sitter/tree-sitter-c-sharp/issues/176), which somewhat saddens me as even a simple syntax highlighting task would be inaccurate. And fixing it will probably make the grammar even more complex.
I continued looking into this a bit more:
filedidn't need any changes https://github.com/tree-sitter/tree-sitter-c-sharp/pull/277get,set, ... accessor keywords could be removed from the list, and also from the conflicts table. https://github.com/tree-sitter/tree-sitter-c-sharp/pull/279valueis not a dedicated token in the grammar, so it needed no change.dynamicandnameofwere only present in_contextual_keywords, so they could be removed without any other changes - https://github.com/tree-sitter/tree-sitter-c-sharp/pull/281
I just came across this tweet, which shows a great test case for contextual keywords: Sharplab link.
The interesting bits:
class where { }
class String {
public static implicit operator String(dynamic var) => null;
}
class dynamic { }
public class var {
dynamic dynamic => null;
void record(where String = null, int y = default)
{
String s = this.dynamic;
}
async Task<int> M() {
var v = new();
await v.async(new async());
return 0;
}
async async async(async async) =>
await async;
}
class await : INotifyCompletion {
...
}
class async {
public await GetAwaiter() => throw null;
}