fsharp icon indicating copy to clipboard operation
fsharp copied to clipboard

Warn on uppercase identifiers in patterns.

Open edgarfgp opened this issue 2 years ago • 18 comments

Fixes #6712, #17878

This PR raises also a warning(behind LanguageFeature.WarnOnUppercaseIdentifiersInPatterns) for uppercase identifiers in patters regardless its length as oppose to the previous hack when it was only raise if the identifier was >= 3

let f Us Uk = () // No warning

let f2 US CA = () // No warning

let f3 U A = () // No warning

let x =    
    match 1 with 
    | US -> "US"  // No warning
    | UK -> "UK" // No warning
    | U -> "U" // No warning

let y = 
    match 2 with 
    | Us -> "Us" // No warning
    | Uk -> "Uk" // No warning
    | U -> "u" // No warning

After

let f Us Uk = () // Warning: Uppercase variable identifiers should not generally be used in patterns, and may indicate a missing open declaration or a misspelt pattern name.

let f2 US CA = () // Warning: Uppercase variable identifiers should not generally be used in patterns, and may indicate a missing open declaration or a misspelt pattern name.

let f3 U A = () // Warning : Uppercase variable identifiers should not generally be used in patterns, and may indicate a missing open declaration or a misspelt pattern name.

let x =    
    match 1 with 
    | US -> "US" // Warning : Uppercase variable identifiers should not generally be used in patterns, and may indicate a missing open declaration or a misspelt pattern name.
    | UK -> "UK" // Warning : Uppercase variable identifiers should not generally be used in patterns, and may indicate a missing open declaration or a misspelt pattern name.
    | U -> "U" // Warning : Uppercase variable identifiers should not generally be used in patterns, and may indicate a missing open declaration or a misspelt pattern name.

let y = 
    match 2 with 
    | Us -> "Us" // Warning : Uppercase variable identifiers should not generally be used in patterns, and may indicate a missing open declaration or a misspelt pattern name.
    | Uk -> "Uk" // Warning : Uppercase variable identifiers should not generally be used in patterns, and may indicate a missing open declaration or a misspelt pattern name.
    | U -> "u" // Warning : Uppercase variable identifiers should not generally be used in patterns, and may indicate a missing open declaration or a misspelt pattern name.

Checklist

  • [x] Test cases added
  • [x] Release notes entry updated

edgarfgp avatar Aug 16 '23 14:08 edgarfgp

I am, honestly, not sure if it's a good idea.

A rule of thumb is - if this will suddently introduce new warnings in existing codebases, it should go under separate version flag or introduce a separate warning, which will be opt-in.

In this particular case - the use cases are country and language codes, which are very common in codebases, and it's not an opt-in warning. I personally thing we should keep it, or make it an additional warning, which will be opt-in.

vzarytovskii avatar Aug 17 '23 11:08 vzarytovskii

I was reading the F# spec and I did not find anything about the check >=3. It only stated the following.

Screenshot 2023-08-17 131434

I think the main for me is that we only get this warning named pattern that is Uppercase and the length is >= 3 which seems confusing to me. my wishful thinking will expect:

  • Dedicated Warning when using named pattern (Pattern matching) Uppercased >= 1
  • Dedicated Warning when using function parameters Uppercased >= 1
  • This will be under a preview lang flag

Update: Now that I know about this. I cannot unsee this:). Maybe @dsyme can give us some historical context.

edgarfgp avatar Aug 17 '23 11:08 edgarfgp

A rule of thumb is - if this will suddenly introduce new warnings in existing codebases, it should go under separate version flag or introduce a separate warning, which will be opt-in.

@vzarytovskii Yeah I will definitely add a lang version flag. I wanted to first understand what were the implications of this check

edgarfgp avatar Aug 17 '23 12:08 edgarfgp

A rule of thumb is - if this will suddenly introduce new warnings in existing codebases, it should go under separate version flag or introduce a separate warning, which will be opt-in.

@vzarytovskii Yeah I will definitely add a lang version flag. I wanted to first understand what were the implications of this check

I'm still not convinced that language flag is a proper way of doing that. It will cause a lot of new warnings in existing codebases when updating to new language version. And only two options will be either to downgrade a language version, or disable the warning.

I think it will instead be a great use case for analyzers sdk.

vzarytovskii avatar Aug 17 '23 12:08 vzarytovskii

I'm still not convinced that language flag is a proper way of doing that. It will cause a lot of new warnings in existing codebases when updating to new language version. And only two options will be either to downgrade a language version, or disable the warning.

I think it will instead be a great use case for analyzers sdk.

Yeah I understand, this check seems to be a terrible hack and dealing with this using an analyser seems a more a bandage. But yeah Happy to close this PR at the very least now I understand the reason of this unexpected compiler behaviour 😓

edgarfgp avatar Aug 17 '23 12:08 edgarfgp

I'm still not convinced that language flag is a proper way of doing that. It will cause a lot of new warnings in existing codebases when updating to new language version.

Well, we have a huge codebase but in the end the amount of code to be fixed was, well, graspable. Also, the original error was for missing the full qualification of DU, which is probably not that common, also judging by the activity in that bug.

Those are just some thoughts, I am sort of on a fence with that. Although at least the collateral changes in this PR are worth merging.

psfinaki avatar Aug 17 '23 13:08 psfinaki

@vzarytovskii @psfinaki Based on the provided feedback, I have reverted to using the original check and:

  • Add a comment to the offending code
  • Update the error message to include the >=3 check and also that it might require qualified access(Please feel free to suggest a more short/clear message)
  • Add more tests.

I think this will, at the very least give more context about this confusing behaviour.

edgarfgp avatar Aug 19 '23 17:08 edgarfgp

Closing this as there does not seem to be a consensus as to how to better deal with this. Feel free to reopen in case you want to take some of the changes in this PR

edgarfgp avatar Aug 22 '23 18:08 edgarfgp

:heavy_exclamation_mark: Release notes required


:white_check_mark: Found changes and release notes in following paths:

Change path Release notes path Description
src/Compiler docs/release-notes/.FSharp.Compiler.Service/9.0.200.md
LanguageFeatures.fsi docs/release-notes/.Language/preview.md

github-actions[bot] avatar Aug 10 '24 16:08 github-actions[bot]

Since it applies to any pattern (including function parameters), we for sure need it under language version, and we'll need to explore a codefix.

vzarytovskii avatar Oct 17 '24 10:10 vzarytovskii

Since it applies to any pattern

It's not. It's controlled by WarnOnUpperFlag flag, which we can extend or adjust, please see https://github.com/dotnet/fsharp/issues/17878#issuecomment-2418637399. I think it may be a bit safer to at least lift it for patterns in match expressions.

auduchinok avatar Oct 17 '24 11:10 auduchinok

Since it applies to any pattern

It's not. It's controlled by WarnOnUpperFlag flag, which we can extend or adjust, please see #17878 (comment). I think it may be a bit safer to at least lift it for patterns in match expressions.

It at least applies to function parameters.

vzarytovskii avatar Oct 17 '24 11:10 vzarytovskii

@vzarytovskii LanguageFeature has been added.

  • Personally I think warning uppercase identifiers in patters regardless its length it the right thing to here.

  • But if we really really need to keep the "hack" like @auduchinok said we can lift it for patterns in match expressions.

  • A code fix will only be useful for identifiers in patterns(where to make these lowercase) expect in match expression. So we will need to add a new error number to trigger it?

edgarfgp avatar Oct 18 '24 09:10 edgarfgp

  • A code fix will only be useful for identifiers in patterns(where to make these lowercase) expect in match expression. So we will need to add a new error number to trigger it?

Don't forget that function arguments are also patterns. This is what was changed in the query code

T-Gro avatar Oct 18 '24 10:10 T-Gro

  • A code fix will only be useful for identifiers in patterns(where to make these lowercase) expect in match expression. So we will need to add a new error number to trigger it?

Don't forget that function arguments are also patterns.

This is what was changed in the query code

Yeah, that's one of the concerns, hopefully not many people have uppercase params.

vzarytovskii avatar Oct 18 '24 11:10 vzarytovskii

  • A code fix will only be useful for identifiers in patterns(where to make these lowercase) expect in match expression. So we will need to add a new error number to trigger it?

Don't forget that function arguments are also patterns. This is what was changed in the query code

@T-Gro Yes. That is why I mentioned that having a separate error number will help trigger the quick fix as oppose.

edgarfgp avatar Oct 18 '24 13:10 edgarfgp

Don't forget that function arguments are also patterns. This is what was changed in the query code

@T-Gro Could you tell more about the queries?

auduchinok avatar Oct 18 '24 13:10 auduchinok

Don't forget that function arguments are also patterns.

This is what was changed in the query code

@T-Gro Could you tell more about the queries?

It's not about queries, Tomáš meant that if you see changes in this PR, you'll see changes to query code which used uppercase parameters, which are technically a pattern and started producing new warning.

vzarytovskii avatar Oct 18 '24 14:10 vzarytovskii

This should only cover all upper case identifiers, pascal case parameters should not produce any warning, since they are much more common, this is much more opinionated than all-upper and we should be dictating it.

Thanks @vzarytovskii for the review. So AAA should but not AaA ?

edgarfgp avatar Oct 27 '24 14:10 edgarfgp

This should only cover all upper case identifiers, pascal case parameters should not produce any warning, since they are much more common, this is much more opinionated than all-upper and we should be dictating it.

Thanks @vzarytovskii for the review. So AAA should but not AaA ?

I think so. We can discuss different cases in separate PRs/issues. As it's common to have such patterns as parameters (much more common than all caps), in one of our internal codebases, there are hundreds on them.

vzarytovskii avatar Oct 27 '24 14:10 vzarytovskii

This should only cover all upper case identifiers, pascal case parameters should not produce any warning, since they are much more common, this is much more opinionated than all-upper and we should be dictating it.

Thanks @vzarytovskii for the review. So AAA should but not AaA ?

I think so. We can discuss different cases in separate PRs/issues. As it's common to have such patterns as parameters (much more common than all caps), in one of our internal codebases, there are hundreds on them.

This should only cover all upper case identifiers, pascal case parameters should not produce any warning, since they are much more common, this is much more opinionated than all-upper and we should be dictating it.

Thanks @vzarytovskii for the review. So AAA should but not AaA ?

I think so. We can discuss different cases in separate PRs/issues. As it's common to have such patterns as parameters (much more common than all caps), in one of our internal codebases, there are hundreds on them.

Humm Im not sure. I think if we truly want fix this we should prevent users from using any identifier pattern that starts with Uppercase.


// Bad
let f Us UK Can = () 

// Good
let f us uK can = () 

// Bad
type Class ()=
    new(Item1) = { item1 = Item1 }

// Good
type Class ()=
    new(item1) = { item1 = item1 }

// Bad
type Class (Us, UK)=

// Good 
type Class (us, uK)=
 
// Bad
try ()
with Uks -> ()

// Good 
try ()
with uks -> ()

Same applies for union cases and exception labels

Please note all these samples already warn if length > =3 expect type Class (Us, UK)=

edgarfgp avatar Oct 27 '24 14:10 edgarfgp

This then will need an explicit approval form @dsyme

vzarytovskii avatar Oct 27 '24 14:10 vzarytovskii

I am late to the party, but here are my thoughts.

The spec says that named patterns have one of the following forms:

Long-ident
Long-ident pat
Long-ident pat-params pat

I think we are talking here only about the first form.

For bindings and function parameters, this form is ALWAYS a variable pattern (and always a simple ident). I see no reasons to disallow or warn about uppercase names here.

Match cases are different, here the first form (if it is a simple ident) can be either a constant or a variable. The spec says that it is a variable if it is lowercase or if it is uppercase and cannot be resolved to a constant. And that in the latter case "a warning is recommended if the length of the identifier is greater than two".

So, for me this looks like we are all fine from a spec point of view.

I am not sure if prescribing lowercase for all bindings and parameters would have been a good idea in the original design of the language, but I would certainly not go there at this point in history.

We CAN improve the existing warnings, as e.g. proposed in the OP of #6712.

Martin521 avatar Oct 27 '24 22:10 Martin521

I am late to the party, but here are my thoughts.

The spec says that named patterns have one of the following forms:

Long-ident
Long-ident pat
Long-ident pat-params pat

I think we are talking here only about the first form.

For bindings and function parameters, this form is ALWAYS a variable pattern (and always a simple ident). I see no reasons to disallow or warn about uppercase names here.

Match cases are different, here the first form (if it is a simple ident) can be either a constant or a variable. The spec says that it is a variable if it is lowercase or if it is uppercase and cannot be resolved to a constant. And that in the latter case "a warning is recommended if the length of the identifier is greater than two".

So, for me this looks like we are all fine from a spec point of view.

I am not sure if prescribing lowercase for all bindings and parameters would have been a good idea in the original design of the language, but I would certainly not go there at this point in history.

We CAN improve the existing warnings, as e.g. proposed in the OP of #6712.

@Martin521 Binding values are allow to start with uppercase or all uppercase so that is not the problem. It is function, constructor parameters that currently only warn the if its length >= 3 which is what this PR is trying to fix.

The spec does not mention anything about the length >=3 requirement. IMO having a consistent requirement to use lowercase when you use a pattern identifier is better and is what this PR proposing. Please have a look the tests.

edgarfgp avatar Oct 27 '24 22:10 edgarfgp

The spec does not mention anything about the length >=3 requirement.

The last sentence of the spec section is "Such a warning is recommended if the length of the identifier is greater than two."

For matches against very short names, the risk of confusion is lower. You typically don't have very short names for exported union cases, exception labels, active pattern case names or literals. (With few exceptions like Ok and PI.) So, this restriction of having warnings (for matches) only for longer uppercase names, arguably makes some sense.

For function parameters, I would rather remove the warning also for longer names, because it does not make sense. It makes sense only for matches. In matches, there is the following risk of confusion: you want to match against a union case, but because of a missing open or a typo, it becomes a variable pattern. Without a warning, this can be very hard to understand, see e.g. #6712. For function parameters and bindings, this risk does not exist.

Martin521 avatar Oct 28 '24 06:10 Martin521

For function parameters, I would rather remove the warning also for longer names, because it does not make sense

Yes. I would be onboard with this too.

edgarfgp avatar Oct 28 '24 08:10 edgarfgp

My main issues here (in particular with PascalCase) are: it's opinionated, it doesn't bring any immediate value to existing customers (just force them to fix their code after just updating SDK). It suites more as analyzer, but I will leave the decision to Don, team and community, my vote is that it shouldn't be a on-by-default warning in compiler (either opt-in or an analyzer), specifically for things like parameters (I can see its value for matches though).

vzarytovskii avatar Oct 29 '24 08:10 vzarytovskii

My main issues here (in particular with PascalCase) are: it's opinionated, it doesn't bring any immediate value to existing customers (just force them to fix their code after just updating SDK). It suites more as analyzer, but I will leave the decision to Don, team and community, my vote is that it shouldn't be a on-by-default warning in compiler (either opt-in or an analyzer), specifically for things like parameters (I can see its value for matches though).

@vzarytovskii So we want an analyser to detect a unfortunate hack checking for country language codes ?. Im not convinced.

What I hope to achieve with this PR is:

  • Show the current inconsistencies
  • Propose a "complete" fix
  • Discuss alternatives
  • Open a conversation and see what we can do here.

I will leave this PR open for couple weeks to see if gets some attention, otherwise I would just accept the unfortunate situation and move on :)

edgarfgp avatar Oct 29 '24 09:10 edgarfgp

So we want an analyser to detect a unfortunate hack checking for country language codes ?. Im not convinced.

I'm mostly not talking about the uppercase identifiers (which is unfortunate as well, but will likely be less apparent), but pascal case ones, which will affect more code (as it's shown in the compiler codebase itself and our internal codebases).

I'm, as I wrote, don't think that we should just force users to fix an issue they don't have (we will just force them to rename a bunch of stuff, since the automatic codefix will be tricky, and will have to take shadowing and scopes into account).

There are realistically two types of code which this warning will be surfacing in:

  1. Actual pattern matching - there it might be uncovering real issues.
  2. Parameters and bindings patters - where it will likely be 99% just noise, where we will be just effectively forcing an opinionated code style, and making users to change their code in sake of changing it (or forcing them to ignore the warning).

That's why I'd like to hear from @dsyme and @T-Gro in particular.

vzarytovskii avatar Oct 29 '24 10:10 vzarytovskii

@vzarytovskii

In the spirit of making some progress here. I have reverted the breaking changes as I understand that there is not way to have a proper fix for this as it will break people's code or will produce unnecessary noise.

Before

match 1 with 
| US -> "US" // No Warning
| UK -> "UK" // No Warning
| U -> "U" // No Warning

After

match 1 with 
| US -> "US" // Warning 
  ^^
| UK -> "UK" // Warning
  ^^
| U -> "U" // Warning
  ^
  • We will only raise a warning in the context of True match case clause. meaning | pat ->
  • We will not not longer raise a warning of function, constructor parameters and all of the other cases that are not | pat ->

For more info check the tests.

edgarfgp avatar Oct 31 '24 13:10 edgarfgp