designs
designs copied to clipboard
Improving default string globalization experience
Draft design document for how we can flatten the learning curve for using string
, removing globalization concepts for developers who needn't be exposed to them, and widening the pit of success for people calling string-based APIs.
This is a draft design document and does not represent a final plan or committed work.
See also:
- https://github.com/dotnet/runtime/issues/43956
- https://github.com/dotnet/runtime/issues/30626
- https://github.com/dotnet/runtime/issues/14065
Can't we have e.g. CompareToNormal
as opposed to CompareToString
to align with members in StringComparison
and StringComparer
?
With the naming convention above, we can omit repeating String
in the method names and have xxx
for char
/StringComparison
overloads, xxxNormal
and xxxIgnoreCase
instead of xxx
for char
/StringComparison
overloads, xxxIgnoreCase
for char
overloads, xxxString
, and xxxStringIgnoreCase
.
Maybe we can even provide char
overloads for xxxNormal
for uniformity.
Also ContainsIgnoreCase
is missing.
Can't we have e.g. CompareToNormal as opposed to CompareToString to align with members in StringComparison and StringComparer?
The issue here is if you are developer new to .NET, would the name CompareToNormal
would be better than CompareToString
? The term Normal
is the best we can find to fit in StringComparison
but we welcome any better names if there is any.
@GrabYourPitchforks should this be marked with api-ready-for-review so we can discuss this?
I expect @stephentoub would want to have a chance to be part of API review before anything's definitely decided, but no doubt helpful to start discussions.
@terrajobst I suspect if we wanted to have a public discussion, it should be a design-specific discussion with no expectation that we would finalize an API immediately. We'd want Steve and usability experts to be involved prior to finalization. This document also suggests usability testing to help test our theories.
While I appreciate the goal of having it be easy to do the correct thing and avoid accidental use of lingusitic operations, I have a lot of concerns here about ergonomics, and poor user experience. You already mention the new developer using an old tutotrial and seeing warnings. But what about experienced developers who move to a newer .NET and are confused/frustrated about overloads they know exist (and are use to using, because they are the "correct" way to do things as of right now) don't show up in Intellisense because they have been made EBNever.
I do have to wonder if for application code, users would not be better served by have an easy way to set the Default culture of all threads to some special variant of the Invariant culture that does ordinal comparisons/equality etc instead of the unicode default algorithms. This would actually make the existing APIs generally do the right thing, and would not not be seriously regressing things for people creating GUI apps, Sadly such users will always need to decide for each case if they want a linguistic operation, or a ordinal one. This design doc says "It is not reasonable to expect the average developer to understand the difference between ordinal and linguistic behavior", but for GUI app developers kind of do need to understand. For things like parsing file formats, they often want ordinal, but for things related to user input, they often want linguistic.
Obviously libraries would still need to avoid accidentally using linguistic methods if they really want ordinal, unless the libraries will only be used by programs that use this special default culture.
I love the goal of this proposal but the new *String
method names seem odd/ugly. Given the prevalence of these methods, would it be worth considering more radical methods to "fix" the current API e.g. runtime changes to enable assembly-level default comparison?
I do have to wonder if for application code, users would not be better served by have an easy way to set the Default culture of all threads to some special variant of the Invariant culture that does ordinal comparisons/equality etc instead of the unicode default algorithms. This would actually make the existing APIs generally do the right thing, and would not not be seriously regressing things for people creating GUI apps,
We considered but rejected this proposal. The reason is that it would affect more fundamental APIs like int.ToString
, which we generally do want to match to the user's current locale by default. It would also significantly impact people writing GUI apps, who rely on things culture-aware sorting when displaying items in a list box.
I am positive towards increasing the pit of success when it comes to string APIs. No developers expect strings API to behave differently depending on global thread static variable (current culture) until the first long-night debugging session due to a localization bug.
I am concerned about the string interpolation feature in C# and F# and how that can be made less dependent on current culture. I realize there's a FormattableString.Invariant()
method but I doubt I get developers to type : FormattableString.Invariant($"Hello {x}")
over $"Hello {x}"
as large part of the charm with string interpolation is the succinctness which is then lost.
If possible, I would like to see an optional analyzer that warns for all usages string interpolation without FormattableString.Invariant
so it's at least easy to find all the places.
The reason is that it would affect more fundamental APIs like
int.ToString
, which we generally do want to match to the user's current locale by default.
Being culture aware in those members are so problematic:
-
int.ToString
orint.Parse
can't be optimized further since they have to touch current culture, or accessInvariantCulture
. Converting betweenint
andstring
very likely to be used frequently. PassingCultureInfo.InvariantCulture
orNumberFormatInfo.InvariantInfo
is even more annoying.double
is more impactful. - Round-tripping of
DataTime
is a nightmare. Formatting in ddmmyy culture and parsing in mmddyy culture can easily lead to bugs, which I have already heard about. - Usage of
IEquatable<string>
is merely unavoidable, especially for tuples and records. Manually making strings ordinal in them will lose almost all the convenience of them.
With my personal bias, as a Chinese speaker, casing and sorting are totally nothing in Chinese. The only thing I benefit from being culture aware is displaying DateTime.
We will soon be planning for .NET 8. Is it an appropriate time to pick this up again?
Pinging @dotnet/project-system as we are considering modernising the resource editor in Visual Studio.
Minor suggestion that doesn't change anything fundamental about the discussion: how about String.HasPrefix
and String.HasSuffix
rather than {Starts,Ends}WithString
? English conveniently has (at least) two terse ways to express "B starts with A".