roslyn icon indicating copy to clipboard operation
roslyn copied to clipboard

Promote "Convert to interpolated string" to an analyzer/fixer

Open stephentoub opened this issue 1 year ago • 14 comments

Summary

The current "Convert to interpolated string" refactoring is really helpful, but it can't be enabled for builds and it doesn't have a fix-all. Ideally it would be converted into an IDEXXXX or CAXXXX analyzer/fixer.

Background and Motivation

"Convert to interpolated string" used to be primarily about code style, but with the improvements to interpolated strings that came in .NET 6 and C# 10, it's now a performance feature. Anywhere code is doing string.Format("...", ...), it's leaving performance (throughput and allocation) on the table by not using an interpolated string.

Proposed Feature

P0: Take the existing functionality and just make it an analyzer/fixer with fix-all support. That way, for example dotnet/runtime can enable it in builds to flag any places we're missing out on free wins, and also auto-fix everything so that we don't have to do the painstaking transformations manually.

P1: Since the refactoring was initially written, the new string.Create(...) method was introduced, which enables supporting culture. Currently the refactoring supports string.Format("...", ...) and string.Format(null, "...", ...), but it gives up on e.g. string.Format(CultureInfo.InvariantCulture, "...", ...). That can now be fixed into string.Create(CultureInfo.InvariantCulture, $"...").

P2: Other APIs now have interpolated string handlers that allow for similar efficiency gains. In particular, StringBuilder has long provided AppendFormat methods that take the same arguments as string.Format, and appends the results to the builder. As of .NET 6, StringBuilder.Append and StringBuilder.AppendLine have overloads that take interpolated string handlers, so the analyzer/fixer can also flag a call like sb.AppendFormat("...", ...) and fix it into sb.Append($"...").

Alternative Designs

https://github.com/dotnet/roslyn-analyzers/pull/6674 is adding an analyzer (fixer to follow) focused on string.Format and StringBuilder.AppendFormat calls where the format string isn't a literal; in such cases, we can use CompositeFormat in .NET 8 to achieve the same throughput/allocation efficiencies at run-time (paying some start-up cost that interpolated strings avoid when the format string is known at compile-time). It also tentatively currently includes a separate diagnostic for when the string is a literal, in particular because it can flag cases where the format provider argument isn't missing or null. It's unfortunate to have two of these. If we can convert the current refactoring into a proper analyzer/fixer, than we can delete this diagnostic from roslyn-analyzers. Alternatively, we could delete the refactoring, and move the current refactoring logic into this analyzer/fixer.

stephentoub avatar Jun 07 '23 16:06 stephentoub

@arkalyanms This woudl be good to do. Can we try to get this in next sprint? Would be good to help devs on team get experience writing analyzers as well.

CyrusNajmabadi avatar Jun 07 '23 16:06 CyrusNajmabadi

@CyrusNajmabadi @arkalyanms What's the status of this issue?

Bartleby2718 avatar Apr 07 '24 19:04 Bartleby2718

@Bartleby2718 Nothing has changed here. if you'd like to help out, we welcome contributions. Thanks! :)

CyrusNajmabadi avatar Apr 07 '24 19:04 CyrusNajmabadi

@CyrusNajmabadi I can take a look, but I don't have too much experience with Roslyn. Could you give me some pointers? Specifically:

  1. Could you share a link to a PR where a fixer was added? (I suppose that will tell me how to write tests for the fixer.)
  2. What should be the ID/category/severity/description of the diagnostic?
  3. Can I start with Format(String, Object) and later add support for other things like Format(String, Object[]) and Format(String, Object, Object)?
  4. The PR was already has an assignee. Should they be unassigned?

Bartleby2718 avatar Apr 07 '24 19:04 Bartleby2718

@Bartleby2718 I recommend reading https://github.com/dotnet/roslyn/blob/main/CONTRIBUTING.md first. We also have a good discord channel over at: https://discord.com/channels/143867839282020352/598678594750775301

You can find examples of anlayzers and fixers here https://github.com/dotnet/roslyn/tree/main/src/Analyzers/Core/Analyzers and https://github.com/dotnet/roslyn/tree/main/src/Analyzers/Core/CodeFixes.

Note that as this is a cross language feature, there will likely be work here, and in the corresponding C# and VB projects.

What should be the ID/category/severity/description of the diagnostic?

For the ID, pick the next multiple of 10 after the main set of IDs not used in IDEDiagnosticIds.cs. Severity should be 'hidden' or 'info'. We can decide with the PR. The description can be figured out with the PR as well.

Can I start with Format(String, Object) and later add support

We already have the refactoring. So this would just be about moving all its code, wholesale, over to the analyzer. So it would be the same set of supported cases.

The PR was already has an assignee. Should they be unassigned?

You both can be assigned to it :)

CyrusNajmabadi avatar Apr 07 '24 19:04 CyrusNajmabadi

Note: the refactorings are here: https://github.com/dotnet/roslyn/blob/main/src/Features/Core/Portable/ConvertToInterpolatedString/ConvertRegularStringToInterpolatedStringRefactoringProvider.cs and https://github.com/dotnet/roslyn/tree/main/src/Features/VisualBasic/Portable/ConvertToInterpolatedString

CyrusNajmabadi avatar Apr 07 '24 19:04 CyrusNajmabadi

@CyrusNajmabadi Actually, I see that the file next to it--AbstractConvertPlaceholderToInterpolatedStringRefactoringProvider.cs--seems more relevant to what @stephentoub wrote. Does that sound right?

Bartleby2718 avatar Apr 14 '24 03:04 Bartleby2718

We probably want to move them all over. With separate diagnostic IDs for each

CyrusNajmabadi avatar Apr 14 '24 04:04 CyrusNajmabadi

Note: VB's interpolated string is much less optimized than C#. It is always naively transpiled into string.Format if there are holes and does not use string.Concat or DefaultInterpolatedStringHandler. Also it cannot be assign to string constants unlike C# 10 or later. There are no plans of its improvement. I think & (equivalent to + in C#) is faster. The performance is not one of pros in VB's interpolated string. I would not interpolated string to be recommended in VB. (I recommend the priority "silent" instead of "recommended" only in VB)

tats-u avatar Apr 20 '24 16:04 tats-u

@tats-u If you don't mind, could you point me to an example where different EnforceOnBuildValues were used for different languages? Given that the shared values are defined in https://github.com/dotnet/roslyn/blob/main/src/Analyzers/Core/Analyzers/EnforceOnBuildValues.cs, I thought there would be VB-specific values defined in https://github.com/dotnet/roslyn/blob/main/src/Analyzers/VisualBasic/Analyzers/EnforceOnBuildValues.cs, but I don't see such a file. Should I create one there?

Bartleby2718 avatar Apr 21 '24 01:04 Bartleby2718

@Bartleby2718 Sorry I don't know much about it because I'm just an outsider, but I can only say that values like it are just constants and used only in the second argument of the constructor of the abstract class AbstractBuiltInCodeStyleDiagnosticAnalyzer.

https://github.com/search?q=repo%3Adotnet%2Froslyn%20AbstractRemoveRedundantEqualityDiagnosticAnalyzer&type=code

I wonder if the priority can be changed depending on languages.

tats-u avatar Apr 21 '24 08:04 tats-u

Yeah I should be able to update this line to use an abstract value, so that each language can have a different value. Will take a note of this suggestion.

Bartleby2718 avatar Apr 21 '24 10:04 Bartleby2718

How about adding an optional argument to the constructor?

protected AbstractUseInterpolatedStringDiagnosticAnalyzer(EnforceOnBuildValues enforceOnBuildValues =  EnforceOnBuildValues.UseInterpolatedString)

tats-u avatar Apr 21 '24 10:04 tats-u

@Bartleby2718 Sorry I've misunderstood. + & & are not concerned with this issue. String.Format$"..." conversion can be done even in VB with the same priority as C# because $"..." isn't faster unlike C# but is still equivalent. We can fix only this kind of code unconditionally even in VB. However, StringBuilder.Append($"...") is resolved into String.Builder.Append(String.Format("...", ...)), which is inefficient, in VB. String handler overloads are not used unlike C#, and string overloads are used instead. VB Analyzer can lack this kind of fix.

https://sharplab.io/#v2:DYLgbgRgPgkgtgBwPYCcAuBnABAZQJ4ZoCmcAsAFDzLrb6EkB0AKkQB5oUAKArhMAJYBjLAGFgAQwzYRFLHKw8+QrADFuAO0Fp+SdVgCyACnFZJWfuuIBzIigCUWAIK00KC1dnyvAEX5wsGBBO2AByRADuuK7uAELc/MAAJraGdp5ecoEMjggIROqJhgAkAEQA3uIAvlgAelgATFgAvFgVWABUppUlaeQZ8gBKRGjcKHpZTEg40epWqelYAKIFqhpaOuoUy4miElJAA=

Now we don't have to prepare a dedicated EnforceOnBuildValues for VB as of now.

tats-u avatar May 08 '24 14:05 tats-u