roslyn
roslyn copied to clipboard
Promote "Convert to interpolated string" to an analyzer/fixer
Summary
The current "Convert to interpolated string" refactoring is really helpful, but it can't be enabled for builds and it doesn't have a fix-all. Ideally it would be converted into an IDEXXXX or CAXXXX analyzer/fixer.
Background and Motivation
"Convert to interpolated string" used to be primarily about code style, but with the improvements to interpolated strings that came in .NET 6 and C# 10, it's now a performance feature. Anywhere code is doing string.Format("...", ...)
, it's leaving performance (throughput and allocation) on the table by not using an interpolated string.
Proposed Feature
P0: Take the existing functionality and just make it an analyzer/fixer with fix-all support. That way, for example dotnet/runtime can enable it in builds to flag any places we're missing out on free wins, and also auto-fix everything so that we don't have to do the painstaking transformations manually.
P1:
Since the refactoring was initially written, the new string.Create(...)
method was introduced, which enables supporting culture. Currently the refactoring supports string.Format("...", ...)
and string.Format(null, "...", ...)
, but it gives up on e.g. string.Format(CultureInfo.InvariantCulture, "...", ...)
. That can now be fixed into string.Create(CultureInfo.InvariantCulture, $"...")
.
P2:
Other APIs now have interpolated string handlers that allow for similar efficiency gains. In particular, StringBuilder
has long provided AppendFormat
methods that take the same arguments as string.Format
, and appends the results to the builder. As of .NET 6, StringBuilder.Append
and StringBuilder.AppendLine
have overloads that take interpolated string handlers, so the analyzer/fixer can also flag a call like sb.AppendFormat("...", ...)
and fix it into sb.Append($"...")
.
Alternative Designs
https://github.com/dotnet/roslyn-analyzers/pull/6674 is adding an analyzer (fixer to follow) focused on string.Format and StringBuilder.AppendFormat calls where the format string isn't a literal; in such cases, we can use CompositeFormat
in .NET 8 to achieve the same throughput/allocation efficiencies at run-time (paying some start-up cost that interpolated strings avoid when the format string is known at compile-time). It also tentatively currently includes a separate diagnostic for when the string is a literal, in particular because it can flag cases where the format provider argument isn't missing or null. It's unfortunate to have two of these. If we can convert the current refactoring into a proper analyzer/fixer, than we can delete this diagnostic from roslyn-analyzers. Alternatively, we could delete the refactoring, and move the current refactoring logic into this analyzer/fixer.
@arkalyanms This woudl be good to do. Can we try to get this in next sprint? Would be good to help devs on team get experience writing analyzers as well.
@CyrusNajmabadi @arkalyanms What's the status of this issue?
@Bartleby2718 Nothing has changed here. if you'd like to help out, we welcome contributions. Thanks! :)
@CyrusNajmabadi I can take a look, but I don't have too much experience with Roslyn. Could you give me some pointers? Specifically:
- Could you share a link to a PR where a fixer was added? (I suppose that will tell me how to write tests for the fixer.)
- What should be the ID/category/severity/description of the diagnostic?
- Can I start with
Format(String, Object)
and later add support for other things likeFormat(String, Object[])
andFormat(String, Object, Object)
? - The PR was already has an assignee. Should they be unassigned?
@Bartleby2718 I recommend reading https://github.com/dotnet/roslyn/blob/main/CONTRIBUTING.md first. We also have a good discord channel over at: https://discord.com/channels/143867839282020352/598678594750775301
You can find examples of anlayzers and fixers here https://github.com/dotnet/roslyn/tree/main/src/Analyzers/Core/Analyzers and https://github.com/dotnet/roslyn/tree/main/src/Analyzers/Core/CodeFixes.
Note that as this is a cross language feature, there will likely be work here, and in the corresponding C# and VB projects.
What should be the ID/category/severity/description of the diagnostic?
For the ID, pick the next multiple of 10 after the main set of IDs not used in IDEDiagnosticIds.cs. Severity should be 'hidden' or 'info'. We can decide with the PR. The description can be figured out with the PR as well.
Can I start with Format(String, Object) and later add support
We already have the refactoring. So this would just be about moving all its code, wholesale, over to the analyzer. So it would be the same set of supported cases.
The PR was already has an assignee. Should they be unassigned?
You both can be assigned to it :)
Note: the refactorings are here: https://github.com/dotnet/roslyn/blob/main/src/Features/Core/Portable/ConvertToInterpolatedString/ConvertRegularStringToInterpolatedStringRefactoringProvider.cs and https://github.com/dotnet/roslyn/tree/main/src/Features/VisualBasic/Portable/ConvertToInterpolatedString
@CyrusNajmabadi Actually, I see that the file next to it--AbstractConvertPlaceholderToInterpolatedStringRefactoringProvider.cs
--seems more relevant to what @stephentoub wrote. Does that sound right?
We probably want to move them all over. With separate diagnostic IDs for each
Note: VB's interpolated string is much less optimized than C#.
It is always naively transpiled into string.Format
if there are holes and does not use string.Concat
or DefaultInterpolatedStringHandler
.
Also it cannot be assign to string constants unlike C# 10 or later.
There are no plans of its improvement.
I think &
(equivalent to +
in C#) is faster. The performance is not one of pros in VB's interpolated string.
I would not interpolated string to be recommended in VB. (I recommend the priority "silent" instead of "recommended" only in VB)
@tats-u If you don't mind, could you point me to an example where different EnforceOnBuildValues
were used for different languages? Given that the shared values are defined in https://github.com/dotnet/roslyn/blob/main/src/Analyzers/Core/Analyzers/EnforceOnBuildValues.cs, I thought there would be VB-specific values defined in https://github.com/dotnet/roslyn/blob/main/src/Analyzers/VisualBasic/Analyzers/EnforceOnBuildValues.cs, but I don't see such a file. Should I create one there?
@Bartleby2718 Sorry I don't know much about it because I'm just an outsider, but I can only say that values like it are just constants and used only in the second argument of the constructor of the abstract class AbstractBuiltInCodeStyleDiagnosticAnalyzer
.
https://github.com/search?q=repo%3Adotnet%2Froslyn%20AbstractRemoveRedundantEqualityDiagnosticAnalyzer&type=code
I wonder if the priority can be changed depending on languages.
Yeah I should be able to update this line to use an abstract
value, so that each language can have a different value. Will take a note of this suggestion.
How about adding an optional argument to the constructor?
protected AbstractUseInterpolatedStringDiagnosticAnalyzer(EnforceOnBuildValues enforceOnBuildValues = EnforceOnBuildValues.UseInterpolatedString)
@Bartleby2718 Sorry I've misunderstood. +
& &
are not concerned with this issue.
String.Format
→ $"..."
conversion can be done even in VB with the same priority as C# because $"..."
isn't faster unlike C# but is still equivalent.
We can fix only this kind of code unconditionally even in VB.
However, StringBuilder.Append($"...")
is resolved into String.Builder.Append(String.Format("...", ...))
, which is inefficient, in VB.
String handler overloads are not used unlike C#, and string overloads are used instead.
VB Analyzer can lack this kind of fix.
https://sharplab.io/#v2:DYLgbgRgPgkgtgBwPYCcAuBnABAZQJ4ZoCmcAsAFDzLrb6EkB0AKkQB5oUAKArhMAJYBjLAGFgAQwzYRFLHKw8+QrADFuAO0Fp+SdVgCyACnFZJWfuuIBzIigCUWAIK00KC1dnyvAEX5wsGBBO2AByRADuuK7uAELc/MAAJraGdp5ecoEMjggIROqJhgAkAEQA3uIAvlgAelgATFgAvFgVWABUppUlaeQZ8gBKRGjcKHpZTEg40epWqelYAKIFqhpaOuoUy4miElJAA=
Now we don't have to prepare a dedicated EnforceOnBuildValues
for VB as of now.