Only Allow Lexical Keywords in the Language

[x] Proposed
[ ] Prototype: Not Started
[ ] Implementation: Not Started
[ ] Specification: Not Started

Summary

Today there are keywords in the language that cannot be understood with just lexical information such as var or nameof. These keywords operate this way for backwards compatibility reasons. I propose that we change the language so that there exist no keywords that cannot be determined via lexical analysis.

Motivation

In general, C# strives to be a language that is explicit about program behavior and normally requires the developer to write out what their intention is without ambiguities. This makes it a language that is easy for someone to read and understand. Once you learn it there is little implicit behavior to consider. I believe having this "gotcha" where keywords are only keywords if nothing else in scope is so-named makes the language harder to read and reason about in general.

There is also the reality that the design goals around C# Language versions and .NET Framework versions have changed. In the past it was paramount that developers could take a new language version update without updating the framework version they were targeting. With language features being increasingly tied to the runtime this makes less sense. We now strongly encourage developers to update both the language version and target framework together.

Detailed design

There is an existing concept in the language called "contextual keywords". I am not proposing doing away with this concept altogether just changing it so that a keyword's "contextual-ness" is always able to be determined lexically. Take new (at the time of this writing) keyword record. We can still know if we are referring to the record keyword or some identifier named record based on the lexical context, there is no ambiguity. However var, according to the spec, requires us to check if there are types named var in scope:

spec

In the context of a local variable declaration, the identifier var acts as a contextual keyword. When the local_variable_type is specified as var and no type named var is in scope,

Similarly, nameof requires checking if a there are any identifiers called nameof in scope

spec

Because nameof is not a reserved keyword, a nameof expression is always syntactically ambiguous with an invocation of the simple name nameof. For compatibility reasons, if a name lookup of the name nameof succeeds, the expression is treated as an invocation_expression -- regardless of whether the invocation is legal. Otherwise it is a nameof_expression.

The implementation of this proposal would remove wording from the spec around name lookup collisions, and have a compliant compiler be able to fully determine keywords given only parsing information.

The following keywords would now error if developers attempted to use them as anything other than a keyword

var
nameof
dynamic
_

Drawbacks

This is a breaking change, if anyone were relying on this behavior in their code it would no longer compile. For cases where a type is named var, dynamic, or _ or a method is called nameof the developer would need to change the usages to @var, @nameof, @dynamic, or @_.

Alternatives

We could opt to keep _ as a contextual keyword that depends on name lookup rules as this is the change most likely to break real-world programs (see discussion on https://github.com/dotnet/csharplang/issues/1064)

Unresolved questions

Design meetings

https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-09-28.md#ungrouped https://github.com/dotnet/csharplang/blob/main/meetings/2024/LDM-2024-09-06.md#only-allow-lexical-keywords-in-the-language

Feb 24 '21 01:02 jmarolf

Previous discussion: https://github.com/dotnet/csharplang/discussions/4458.

Feb 24 '21 11:02 svick

I feel that the breaking changes that would be introduced by adopting this proposal on its own fall into three categories:

"Nobody is broken." Almost nobody uses var, dynamic or nameof as identifiers in C#, it's fine to break the tiny number of people who do.
"Some people are broken." The pattern where _ is the name of a used lambda parameter (e.g. _ => _.Name) is fairly rare, but not unheard of in C#. I think it's probably acceptable to break this kind of code, but ways of softening the blow should be seriously considered (e.g. a code fix to rename such parameters; or warning in C# 10 and only making it an error in C# 11).
"Lots of people are broken." The pattern where _ is the name of an unused lambda parameter (e.g. _ => {}) is very common in C# and it is completely unacceptable to break such code. The obvious solution would be to change the meaning of that code to make the _ a discard. But I think it's important to note that this additional change would be required, assuming this proposal is not meant to be a massive breaking change.

Feb 24 '21 11:02 svick

Will this break analyzers/codefixes?

Feb 24 '21 11:02 Youssef1313

@svick looking at github we have:

~17K uses of _ =>
~1K uses of var _ = or using var _ =

While I am only proposing removing name lookup and var _ = GetResults(); is not lexically ambiguous with _ = GetResults(); there could potentially be odd errors with _ in this case. I am willing to say we keep the name lookup rules for _ if there are concerns.

Feb 24 '21 17:02 jmarolf

Will this break analyzers/codefixes?

that is entirely dependent on the compiler implementation, but new language versions are always allowed to break analyzers.

Consider when expression-bodied-members were added. If you previously assumed (not unreasonably) in your analyzer that all methods with a body contained a block syntax you were broken since now the body could just be an arrow expression.

Feb 24 '21 17:02 jmarolf

See: https://github.com/dotnet/csharplang/discussions/4466

Conflating the two was always going to create a lot of confusion, and the parser that get's broken/confused the most is the human parser.

Feb 25 '21 15:02 HaloFour

I've switched my position on https://github.com/dotnet/csharplang/issues/1064 (disallowing _ as an identifier) from downvote to upvote. Sure, it would break the past years of my code in which I used to use this for lambda parameters, but ship a solution-wide light bulb fix for it and I'd want to use that light bulb fix anyway to replace my usages of _, even if I wasn't forced to. Opting into a new major versions of C# feels like an expected time for something like this to happen.

Maybe soft-deprecating by adding a new compiler warning in C# 10 that tells you that _ as an identifier will be disallowed starting in C# 11 would make this seem less abrupt.

I agree with @HaloFour. I think the human parser should be the most important factor. While unlikely, the possibility of code like this should make us uneasy:

public class Foo
{
    private int _;

    // Doing something important in some other file that is affected by reading Foo.WrappedValue?
    public int WrappedValue => _;

    public bool IsNumber(string input)
    {
        return double.TryParse(input, out _); // oops!
    }
}

(stolen from http://gafter.blogspot.com/2017/06/making-new-language-features-stand-out.html?showComment=1509474504510#c7458806139970524286)

Feb 28 '21 17:02 jnm2

I hope this proposal goes nowhere. I like the underscore usage because it makes the code less boring. As a writer, I like to use dashes, colons, semicolons, and etc to make my writing more interesting just like an _ makes the code more interesting, though it's rarely used. I want C# to be an intermediate-level language and developers having trouble with var and etc, they should look at Lua or etc.

Apr 02 '21 13:04 chrizpro

Java, which is used by a larger (and some would argue more resistant to change) community seemingly has had little/no issue with the language deprecating and then disallowing the use of _ as an identifier or var as a type name. They've done this in the past as well with names like assert. Usually it doesn't matter as the contextual keyword wouldn't be expected to be used as a type name.

I've been pretty outspoken against the use of _ as a discard as well as an identifier. Since the ship has sailed on discards I think its use as an identifier should be reconsidered. It sounds good on paper to avoid reinterpreting/breaking any existing code, but now the language has this wart where developers need to remember which combination of features will cause the compiler to prefer _ as an identifier vs. where the compiler will always consider it a discard, and where _ is preferred to be an identifier the developer has to wade past the type checks on these "variables" that the developer never intended to actually use. The case of accidentally overwriting some field name might be pathological, but the mental burden on the developer will still always be there. I would've much preferred if the compiler phased out _ as an identifier over a few releases, with fixers to replace it with another identifier, and then switch it to a discard wholesale. Names are cheap. Contextual keywords that change their meaning based on nuanced use of other language features and where the contexts are very likely to collide are not.

Apr 02 '21 13:04 HaloFour

My views are basically the same as @HaloFour 's. I think semantically contextual keywords make a lot of sense in theory but add developer complexity and overhead for little benefit.

Apr 02 '21 21:04 jmarolf

for little benefit.

I think this is very debatable. Consider the work we're doing to support field inside properties in C# 10. If we make these keywords and not contextual keywords, we simply break people (including ourselves). And we break them despite them having done nothing wrong. For example, it would break code that is totally normally and reasonable and not at all deviating from teh norms of the ecosystem at all. I don't like the idea that someone coudl follow every best practice we gave, and then end up breaking just for expediency on our part. In most (all?) cases, supporting semantic contextual keywords is not difficult. Indeed, it's one of the simpler things to support. You simply bind as normal and accept the prior meaning if it is valid. If it isn't, then you allow the new meaning. This means we can gently add new things to the language and not have to worry at all about breaking people.

Apr 02 '21 21:04 CyrusNajmabadi

Consider the work we're doing to support field inside properties in C# 10.

For clarification: this proposal explicitly states that cases like this should keep working as they do, contextual keywords will always exist. Just because properties use the value contextual keyword does not mean that we should now force that to be a keyword at all times. This proposal is about ensuring that contextual keywords can always be determined based solely on lexical information as opposed to semantic information.

In most (all?) cases, supporting semantic contextual keywords is not difficult. Indeed, it's one of the simpler things to support. You simply bind as normal and accept the prior meaning if it is valid. If it isn't, then you allow the new meaning. This means we can gently add new things to the language and not have to worry at all about breaking people.

I totally agree that there is not engineering reason to change this. It just works for the compiler folks (as far as I am aware). But I think it add an unnecessary burden on programmers using the language. Things like var and nameof feel very unfortunate to me. Anyone following best practices in C# does not expect var to be unavailable to them or for nameof to have different semantics based on exoteric name lookup rules. It feels like a real "gotcha" moment where I can go on twitter and "well actually" anyone that uses code with these semantic contextual keywords and say "Oh you are actually not discarding that but assigning a value to a variable named _".

Every other language I've encounterd (C++, Jave, Typescript, Pytho, Go) does not use name lookup rules to determine whether something is a keyword (including F# and Visual Basic) and there have been no complaints. I personally feel that all this concern over keyword breakage has no real evidence, its all theoretical. Java can just add a new keyword if they need to and no one complains.

Apr 02 '21 22:04 jmarolf

This proposal is about ensuring that contextual keywords can always be determined based solely on lexical information as opposed to semantic information.

Right. but the problem with that is that it direclty goes against design goals we have for these features. for example, we want you to just be able to say field. There's nothign lexical/syntactic to distinguish that this is special. It's just going to reference the auto-prop field if nothing else binds.

ut I think it add an unnecessary burden on programmers using the language

I don't really see this as a burden. For people just using the language, using var is going to work. So what needs to be fixed? Same with nameof, etc. People using our APIs could certainly be better served here with better APIs. but that would be a roslyn concern.

and there have been no complaints.

This is not true. TAke 'go' for example. There are lots of complaints about the verbosity of the language. And part of htat verbosity arises because the language doesn't want to get into this space. So it ensures all it's constructs are extremely verbose and often unweildy, just so it doesn't have to do any semantic checks on this sort of thing. It's a tradeoff they made, but which we're quite loathe to as it really just bulks up the language.

Apr 02 '21 22:04 CyrusNajmabadi

Right. but the problem with that is that it direclty goes against design goals we have for these features. for example, we want you to just be able to say field. There's nothign lexical/syntactic to distinguish that this is special. It's just going to reference the auto-prop field if nothing else binds.

I would need to review the proposal but isn't this going to work exactly like value? You can just say that field is reserved now and you use @field if you need to "escape" the fact that this is a keyword now. I think this is an important distinction to the reader. You now have to explicitly state what your intent is. You are essentially saying "a casual reading of this might lead to believe this is the field keyword, which has specific semantics but that is not what is happening here, this is a custom instance and @field clues you into what is happening." If we were to do it all over again would we have everything be a contextual keyword? I dunno I suppose I could see the argument, why put roadblocks in folks way. My position is that it's a weird language corner case that most C# developers are not aware of and is surprising to them when they learn about it.

If there is a design goal that can only be achieve with name lookup rules or everyone else in the LDM just disagrees and thinks that semantic contextual keywords are awesome and we wish we did them more often great! Thats not my position but I am willing to be convinced.

Apr 02 '21 23:04 jmarolf

I would need to review the proposal but isn't this going to work exactly like value?

No. 'value' always binds to the property parameter prior to anything else in a higher scope. field will not (As that would break existing, perfectly fine code).

You can just say that field is reserved now and you use @field if you need to "escape" the fact that this is a keyword now

That would break lots of code taht is totally fine today and which wasn't doing anything strange or inappropriate. I do not see how customers are helped by just changing the meaning of their code on them.

You are essentially saying "a casual reading of this

We are not, and should be beholder to 'a casual reading of this'.

If you see this:

local = 0;

What does a casual reading tell you? Almost nothing. This could be a local, or a field, or a property, or a parameter. it could be assigned. it could be assigned by-ref. it could have conversions. it could throw. etc. etc. etc.

And that's just assignemnt. Once you get the . operator, all bets are 100% off :)

Apr 02 '21 23:04 CyrusNajmabadi

My position is that it's a weird language corner case that most C# developers are not aware of and is surprising to them when they learn about it.

Weird corner cases are always like that. But we have tons of those everywhere. The question is: is getting rid of weird corners better or worse than breaking code? The position we've landed on generally comes down to:

is the code that is breaking reasonable? or is it unreasonable?
is it widespread, or likely not used at all?

If it's unreasonable (which often comes down to debate) we are more likely to take the stance: trying to prop up this code is not worth it, so we would prefer to change it and accept that pathological cases break.

Similarly, if something is widespread, then we've already opened the barn door. People clearly are using the language in this fashion in a significant fashion, and I think we have to accept that.

Where we have room to play around with is when you get into teh 'unreasonable, and not used (or very very rarely used)' territory. This is like someone coming along now and saying: yeah, i'm going to name my type var even though .net naming conventions (both formal and informal) from day 1 have been that types are PascalCased. This is both unreasonable IMO for somoene to do this, and likely extraordinarily niche. (Indeed, my expectation is that this only exists in projects that seek to subvert the language/compiler, in which case i don't think of that as a reasonable thing to cater to).

--

So, in the case of some keywords (var, record, etc.) i'm actually ok with us taking over and saying: yeah, at this point this is ours. Reasonable codebases won't have any pain at all moving to this.

However, for some keywords, i'm not ok with us doing this. If the pattern is either reasonable, or widespread, we need to accept that and not harm users when we have a perfectly suitable way to both introduce the feature and keep things working just fine.

Apr 02 '21 23:04 CyrusNajmabadi

So, in the case of some keywords (var, record, etc.) i'm actually ok with us taking over and saying: yeah, at thsi point this is ours. Reasonable codebases won't have any pain at all moving to this.

However, for some keywords, i'm not ok with us doing this. If the pattern is eitehr reasonable, or widespread, we need to accept that and not harm users when we have a perfectly suitable way to both introduce the feature and keep things working just fine.

I think this is a totally reasonable stance to take. var feel pretty uncontroversial (to me) but other keywords feel much further along in the spectrum of causing unreasonable harm. If the LDM says "var should just be a keyword but these others I think should stay as they are" I would be totally fine with that. I Just want us to take the time to re-evaluate this and make sure we still feel the same way.

In the past there were more situations where a newer version of C# could be "pushed" on you. Today it's an explicit decision to update your SDK version to get an updated version of C#. Major SDK versions also have major breaking changes (api names changes etc.) to the point that developers expect some friction. I think its not unreasonable to have folks change field to @field in these upgrade situations but I will admit I am taking a stance that is way over to one side on how ok I am with breaks. Others do not need to join me over here.

Apr 02 '21 23:04 jmarolf

If we scope this to the language reserves the space of lowercase ascii identifiers for **type** contexts, then i'm totally ok with that :)

That would address, var, unmanaged, notnull, dynamic and possibly some others that i'm not remembering.

Apr 02 '21 23:04 CyrusNajmabadi

I like the scoping here but I would also like to consider the case of discards. As the feature is written today it's hard to use discards broadly in a method and instead is most useful in a limited set of circumstances. In too many cases it subtly turns into an identifier, not a discard, and suddenly that invalidates other uses within the method body and suddenly you have to drop back to ignored names.

Apr 04 '21 23:04 jaredpar

I'd def like to break out an issue on discards. I;m curious about hte cases that are hard here and where it's difficult to mesh the idea of:

use existing semantics if the code is legal
reinterpret as discard if not

I think discards are also a space we could potentially experiment with a .net upgrade style approach where we unilaterally reinterpretted this stuff, but had tools fix the issue if you used these as non-discards in your project.

Apr 04 '21 23:04 CyrusNajmabadi

I;m curious about hte cases that are hard here

Converting between lambdas and local functions. Parameters in lambdas can be discards but not in local functions. That means when swapping between the two it introduces unnecessary friction because you have to rationalize discard behavior. It's no longer what essentially amounts to a syntax transform.

Whether a _ is a discard or identifier in a lambda comes down to the count of parameters that you have. A single parameter means it's an identifier but multiple mean it's a discard.

// _ is an identifier 
Action<int> action = (_) => { 
    _ = ""; // Error cause _ is an int identifier 
};

// _ is a discard
Action<int, int> action2 = (_, _) => { 
    _ = ""; // Okay cause this is a discard 
};

This is generally frustrating to have to remember but really gets frustrating when you consider it in the context of refactoring or code changes. Consider that lambda parameters are often listed as discards because they're a callback value that you may not need. Circumstances change and it's rational to begin using a parameter which begins by assigning it a name. If assigning that parameter a name though means there is only one _ remaining then it becomes an identifier and suddenly all the other _ inside the method body are now interpretted as identifiers which can cause compilation errors.

string token = ...;
Action<string, string> = (_, value) => {
     // Error: This worked before I changed the second parameter to have a name
     if (int.TryParse(token, out _) { 
        ...    
}

This though means there is a huge incentive to prefer out var _ over out _ even when _ currently points to a discard. The out var _ form is one of the few places where _ unambiguously refers to a discard. Yet even though _ is more succinct developers should consider always using the out var _ form, even though it's longer and doesn't actually declare a variable, because it's more future proof to cases where _ gets bound as a discard.

These together all make it frustrating to use discards. It's too easy to get trapped in a case where _ suddenly binds to an identifier and that will invalidate many other cases in the method where you depended on having discards available and there is little recourse for the developer when that happens.

Apr 04 '21 23:04 jaredpar

Whether a _ is a discard or identifier in a lambda comes down to the count of parameters that you have. A single parameter means it's an identifier but multiple mean it's a discard.

Could we change that and instead make it so that if it's an error with the prior semantics, then it can now be reinterpretted as a discard?

These together all make it frustrating to use discards. It's too easy to get trapped in a case where _ suddenly binds to an identifier and that will invalidate many other cases in the method where you depended on having discards available and there is little recourse for the developer when that happens.

I have a supposition we can fix that, without having to go whole-hog into: all _ are always discards.

The open question for me is if there are cases where code would be legal under either interpretation (identifier or discard), and you want the latter, and interpretting as the former would lead to undesirable behavior. If that exists, then this approach would likely not be viable. However, my hunch is that this would allow for:

existing code to continue to compile with its existing meaning.
Code that is currently in error will now compile, with a meaning that is sensible.
Code that could potentially have both meanings (and this will retain the 'identifier' interpretation) will behave in a desirable way.

Apr 04 '21 23:04 CyrusNajmabadi

@CyrusNajmabadi

Could we change that and instead make it so that if it's an error with the prior semantics, then it can now be reinterpretted as a discard?

I like where this is going but it sounds like there could be a lot of potentially tricky edge cases, especially if the code seems to intentionally mix discards and _ as an identifier:

if (int.TryParse(s, out _)) {
    // ...
}
// later ...
var bar = foo.Select(_ => _.Bar)

Apr 05 '21 12:04 HaloFour

Could we change that and instead make it so that if it's an error with the prior semantics, then it can now be reinterpretted as a discard?

Can't do that because _ is a legal identifier. As @HaloFour pointed out it's just fine to use it via _.ToString(), etc ... You can't even take shortcuts like saying "okay, if _ is only used for assignment or out then make it a discard" because assignments to a _ can have side effects (implicit conversion tricks).

This is the core problem we're facing. The decisions of C# 1.0 are essentially limiting our ability to make _ a friction free feature. Unless we take some sort of conditional break here then we're essentially stuck with those decisions.

Apr 05 '21 14:04 jaredpar

Can't do that because _ is a legal identifier.

I'm not taking about the cases where is has legal, error free, semantics.

I'm talking about the cases where it has illegal semantics. For example, where it would cause a scope collision.

This code would be illegal today, and so we can come up with rules to make it legal by saying: ah, ready these all as discards now.

Apr 05 '21 15:04 CyrusNajmabadi

I'm unsure what you're asking for at this point.

Apr 05 '21 15:04 jaredpar

In cases where the code compiles today using current rules, preserve the meaning of that code.

In cases where the code does not compile (for example, because of scope collision), allow reinterpretation as a discard.

This is effectively similar to how other semantic identifiers work. If 'nameof' binds, then use that, otherwise it is the semantic keyword.

Except instead of asking if it binds, ask if there is a scoping collision or not found, it things like that. In that case, treat as discard.

Apr 05 '21 15:04 CyrusNajmabadi

That just sounds incredibly dangerous to me. Did you mean to use it as identifier and messed up, or did you actually want to discard? The intent of _ is to make the programmer's intent clear, but this would do the opposite.

Apr 05 '21 15:04 333fred

Like I said, I want to do the mental exercise here.

With the other semantic identifiers the above holds (with the same arguments), but it really didn't turn it to be an issue.

My supposition is that it will be very obvious very quickly.

Apr 05 '21 15:04 CyrusNajmabadi

Binding isn't enough though because it doesn't fix any of the problems I outlined. Today function parameters and single parameter lambdas are always bound as identifiers. Hence we can't take the approach we take with other contextual keywords (it's what we do today)

If we want to take the approach of "discard if it doesn't impact behavior" then that means we have to effectively implement two different binding passes. Because in order to determine if it's legal as discard, by that I mean doesn't change the side effects of the program, then you have to do semantic analysis. Have to understand for example if the current approach has silent implicit conversions. Consider the following as an example:

M(x, _ => { _ = x; });

In order to understand if it is legal to treat _ as a discard or must be preserved as an identifier you must go through a full binding pass. It's completely possible that it binds to the following for which treating _ as a discard in the future would be a breaking change.

void M(Action<string, dynamic> a);

I don't think this is worth doing a double binding pass on methods which is why I'm pushing for other approaches.

Apr 05 '21 15:04 jaredpar

csharplang
csharplang copied to clipboard

[Proposal]: Only Allow Lexical Keywords in the Language

Only Allow Lexical Keywords in the Language

Summary

Motivation

Detailed design

Drawbacks

Alternatives

Unresolved questions

Design meetings

csharplang csharplang copied to clipboard

[Proposal]: Only Allow Lexical Keywords in the Language

Only Allow Lexical Keywords in the Language

Summary

Motivation

Detailed design

Drawbacks

Alternatives

Unresolved questions

Design meetings

csharplang
csharplang copied to clipboard