designs icon indicating copy to clipboard operation
designs copied to clipboard

Improving default string globalization experience

Open GrabYourPitchforks opened this issue 3 years ago • 16 comments

Draft design document for how we can flatten the learning curve for using string, removing globalization concepts for developers who needn't be exposed to them, and widening the pit of success for people calling string-based APIs.

This is a draft design document and does not represent a final plan or committed work.

See also:

  • https://github.com/dotnet/runtime/issues/43956
  • https://github.com/dotnet/runtime/issues/30626
  • https://github.com/dotnet/runtime/issues/14065

GrabYourPitchforks avatar Apr 17 '21 00:04 GrabYourPitchforks

Can't we have e.g. CompareToNormal as opposed to CompareToString to align with members in StringComparison and StringComparer?

Happypig375 avatar Apr 17 '21 12:04 Happypig375

With the naming convention above, we can omit repeating String in the method names and have xxx for char/StringComparison overloads, xxxNormal and xxxIgnoreCase instead of xxx for char/StringComparison overloads, xxxIgnoreCase for char overloads, xxxString, and xxxStringIgnoreCase.

Happypig375 avatar Apr 17 '21 12:04 Happypig375

Maybe we can even provide char overloads for xxxNormal for uniformity.

Happypig375 avatar Apr 17 '21 13:04 Happypig375

Also ContainsIgnoreCase is missing.

Happypig375 avatar Apr 17 '21 13:04 Happypig375

Can't we have e.g. CompareToNormal as opposed to CompareToString to align with members in StringComparison and StringComparer?

The issue here is if you are developer new to .NET, would the name CompareToNormal would be better than CompareToString? The term Normal is the best we can find to fit in StringComparison but we welcome any better names if there is any.

tarekgh avatar Apr 17 '21 17:04 tarekgh

@GrabYourPitchforks should this be marked with api-ready-for-review so we can discuss this?

terrajobst avatar Apr 27 '21 00:04 terrajobst

I expect @stephentoub would want to have a chance to be part of API review before anything's definitely decided, but no doubt helpful to start discussions.

danmoseley avatar Apr 27 '21 01:04 danmoseley

@terrajobst I suspect if we wanted to have a public discussion, it should be a design-specific discussion with no expectation that we would finalize an API immediately. We'd want Steve and usability experts to be involved prior to finalization. This document also suggests usability testing to help test our theories.

GrabYourPitchforks avatar Apr 27 '21 04:04 GrabYourPitchforks

While I appreciate the goal of having it be easy to do the correct thing and avoid accidental use of lingusitic operations, I have a lot of concerns here about ergonomics, and poor user experience. You already mention the new developer using an old tutotrial and seeing warnings. But what about experienced developers who move to a newer .NET and are confused/frustrated about overloads they know exist (and are use to using, because they are the "correct" way to do things as of right now) don't show up in Intellisense because they have been made EBNever.

I do have to wonder if for application code, users would not be better served by have an easy way to set the Default culture of all threads to some special variant of the Invariant culture that does ordinal comparisons/equality etc instead of the unicode default algorithms. This would actually make the existing APIs generally do the right thing, and would not not be seriously regressing things for people creating GUI apps, Sadly such users will always need to decide for each case if they want a linguistic operation, or a ordinal one. This design doc says "It is not reasonable to expect the average developer to understand the difference between ordinal and linguistic behavior", but for GUI app developers kind of do need to understand. For things like parsing file formats, they often want ordinal, but for things related to user input, they often want linguistic.

Obviously libraries would still need to avoid accidentally using linguistic methods if they really want ordinal, unless the libraries will only be used by programs that use this special default culture.

KevinCathcart avatar May 04 '21 14:05 KevinCathcart

I love the goal of this proposal but the new *String method names seem odd/ugly. Given the prevalence of these methods, would it be worth considering more radical methods to "fix" the current API e.g. runtime changes to enable assembly-level default comparison?

mhutch avatar May 04 '21 17:05 mhutch

I do have to wonder if for application code, users would not be better served by have an easy way to set the Default culture of all threads to some special variant of the Invariant culture that does ordinal comparisons/equality etc instead of the unicode default algorithms. This would actually make the existing APIs generally do the right thing, and would not not be seriously regressing things for people creating GUI apps,

We considered but rejected this proposal. The reason is that it would affect more fundamental APIs like int.ToString, which we generally do want to match to the user's current locale by default. It would also significantly impact people writing GUI apps, who rely on things culture-aware sorting when displaying items in a list box.

GrabYourPitchforks avatar May 04 '21 17:05 GrabYourPitchforks

I am positive towards increasing the pit of success when it comes to string APIs. No developers expect strings API to behave differently depending on global thread static variable (current culture) until the first long-night debugging session due to a localization bug.

I am concerned about the string interpolation feature in C# and F# and how that can be made less dependent on current culture. I realize there's a FormattableString.Invariant() method but I doubt I get developers to type : FormattableString.Invariant($"Hello {x}") over $"Hello {x}" as large part of the charm with string interpolation is the succinctness which is then lost.

If possible, I would like to see an optional analyzer that warns for all usages string interpolation without FormattableString.Invariant so it's at least easy to find all the places.

mrange avatar Aug 14 '21 17:08 mrange

The reason is that it would affect more fundamental APIs like int.ToString, which we generally do want to match to the user's current locale by default.

Being culture aware in those members are so problematic:

  • int.ToString or int.Parse can't be optimized further since they have to touch current culture, or access InvariantCulture. Converting between int and string very likely to be used frequently. Passing CultureInfo.InvariantCulture or NumberFormatInfo.InvariantInfo is even more annoying. double is more impactful.
  • Round-tripping of DataTime is a nightmare. Formatting in ddmmyy culture and parsing in mmddyy culture can easily lead to bugs, which I have already heard about.
  • Usage of IEquatable<string> is merely unavoidable, especially for tuples and records. Manually making strings ordinal in them will lose almost all the convenience of them.

With my personal bias, as a Chinese speaker, casing and sorting are totally nothing in Chinese. The only thing I benefit from being culture aware is displaying DateTime.

huoyaoyuan avatar Aug 30 '21 21:08 huoyaoyuan

We will soon be planning for .NET 8. Is it an appropriate time to pick this up again?

danmoseley avatar Aug 17 '22 15:08 danmoseley

Pinging @dotnet/project-system as we are considering modernising the resource editor in Visual Studio.

drewnoakes avatar Aug 19 '22 16:08 drewnoakes

Minor suggestion that doesn't change anything fundamental about the discussion: how about String.HasPrefix and String.HasSuffix rather than {Starts,Ends}WithString? English conveniently has (at least) two terse ways to express "B starts with A".

Smaug123 avatar Nov 15 '23 00:11 Smaug123