csharplang
csharplang copied to clipboard
[Proposal]: Improved interpolated strings
Improved interpolated strings
- [x] Proposed
- [ ] Prototype: Not Started
- [ ] Implementation: Not Started
- [ ] Specification: Not Started
Split out from https://github.com/dotnet/csharplang/issues/2302.
Specification is here: https://github.com/dotnet/csharplang/blob/main/proposals/csharp-10.0/improved-interpolated-strings.md
LDM Notes:
https://github.com/dotnet/csharplang/blob/main/meetings/2021/LDM-2021-03-01.md#interpolated-string-improvements https://github.com/dotnet/csharplang/blob/main/meetings/2021/LDM-2021-03-15.md#interpolated-string-improvements https://github.com/dotnet/csharplang/blob/main/meetings/2021/LDM-2021-03-24.md#improved-interpolated-strings https://github.com/dotnet/csharplang/blob/main/meetings/2021/LDM-2021-08-25.md
I don't understand this part of the spec:
By making the builder an
outparameter we allow overloading of theGetInterpolatedStringBuildermethod by builder type, which is useful for scenarios like the logger above, which could haveTraceLoggerParamsBuilder/DebugLoggerParamsBuilder/WarningLoggerParamsBuilder/etc.
How would the overload occur in practice when the compiler translates the string interpolation expression using out var or equivalent?:
TraceLoggerParamsBuilder.GetInterpolatedStringBuilder(baseLength: 47, formatHoleCount: 1, receiverTemp, out var builder);
How would the compiler know which overload to select if there were multiple? By my reading of the spec, GetInterpolatedStringBuilder would have to be implemented as a static method on 3 different types in order to have TraceLoggerParamsBuilder/DebugLoggerParamsBuilder/WarningLoggerParamsBuilder rather than being able to use an overload.
@yaakov-h I used var for convenience there. It would not be var in practice.
In that case how would the compiler ever resolve an overload to something other than the type itself?
When an interpolated_string_expression is passed as an argument to a method, we look at the type of the parameter. If the parameter type has a static method GetInterpolatedStringBuilder ... and has an out parameter of the type of original method's parameter
It seems like those two types will always be the same, so where is there room for overloading?
Or would it be the case that a base class could provide overloads for a number of its descendant classes?
In that case how would the compiler ever resolve an overload to something other than the type itself?
I'm not sure what you mean. The idea is that a logging provider could provide GetInterpolatedString(..., out TraceLoggerParamsBuilder/InfoLoggerParamsBuilder/etc builder), and then LogTrace(TraceLoggerParamsBuilder builder), LogInfo(InfoLoggerParamsBuilder builder), etc. We'd then know which version of GetInterpolatedString to call based on the builder type.
I'm confused then.
My understanding based on my reading of the spec is that the parameter type has to provide GetInterpolatedString, so you would need:
ref struct TraceLoggerParamsBuilder
{
public static void GetInterpolatedStringBuilder(..., out TraceLoggerParamsBuilder builder) { ... }
}
ref struct InfoLoggerParamsBuilder
{
public static void GetInterpolatedStringBuilder(..., out InfoLoggerParamsBuilder builder) { ... }
}
ref struct WarnLoggerParamsBuilder
{
public static void GetInterpolatedStringBuilder(..., out WarnLoggerParamsBuilder builder) { ... }
}
interface ILog
{
void LogTrace(TraceLoggerParamsBuilder builder);
void LogInfo(InfoLoggerParamsBuilder builder);
void LogWarn(WarnLoggerParamsBuilder builder);
}
How could this be restructured such that one class defines all three overloads of GetInterpolatedStringBuilder?
How could this be restructured such that one class defines all three overloads of
GetInterpolatedStringBuilder?
Could we you attributes instead?
ref struct LogParamsBuilder {
public static InfoLoggerParamsBuilder GetTraceInterpolatedStringBuilder(...) { ... }
public static InfoLoggerParamsBuilder GetInfoInterpolatedStringBuilder(...) { ... }
public static InfoLoggerParamsBuilder GetWarnInterpolatedStringBuilder(...) { ... }
}
interface ILog
{
void LogTrace([ParamBuilder("GetTraceInterpolatedStringBuilder")] LogParamsBuilder builder);
void LogInfo([ParamBuilder("GetInfoInterpolatedStringBuilder")] LogParamsBuilder builder);
void LogWarn([ParamBuilder("GetWarnInterpolatedStringBuilder")] LogParamsBuilder builder);
}
Where the attribute takes the method name of the type. It could infer the type from the parameter type like the current proposal, or optionally take a typeof(LogParamsBuilder ) if for some reason the builder is different from the ParamsBuilder
Is there a performance gain by using an out parameter or was it just for overload resolution? It seems more intuitive to return a value then to have a void method with a single out parameter.
Im not sure on how the compiler would behave here, but it seems easier for the compiler if is a parameter attribute. If its an arbitrary class as a parameter to a method, then presumably every method call has to figure out whether the parameter is a possible param builder. Whereas the attribute will efficiently and obviously indicate this behaviour
then presumably every method call has to figure out whether the parameter is a possible param builder.
Yes, but only in cases where an interpolated string is actually passed as an argument, and only for that particular parameter type. This is not likely to be a very expensive check, I'm not worried about it :).
How can this treat item format?
Log($"{i:X4}");
How can this treat item format?
However the builder wants? It needs to have an overload of TryFormat that accepts the interpolation hole content, which is both i and the string "X4". What happens after that is up to the builder type.
I want to avoid string allocation of i.ToString("X4"). As an alternative idea, I want ISpanFormattable to be public.
Span<char> buffer = stackalloc char[4];
((ISpanFormattablei)i).Format(buffer, out var written, "X4");
Log(buffer.AsSpan(0, written));
Wouldn't the existing design avoid that kind of allocation if you chose to implement it kinda like this?:
public bool TryFormat(int i, string format)
{
Span<char> buffer = stackalloc char[4];
bool formatted = i.TryFormat(buffer, out var written, format);
Debug.Assert(formatted == true);
Log(buffer.AsSpan(0, written));
}
Wouldn't the existing design avoid that kind of allocation if you chose to implement it kinda like this?:
public bool TryFormat(int i, string format) { Span<char> buffer = stackalloc char[4]; bool formatted = i.Try.Format(buffer, out var written, format); Debug.Assert(formatted == true); Log(buffer.AsSpan(0, written)); }
Yup. The builder implements the TryFormat method. We never call ToString on any components in this proposal.
I found spec https://github.com/dotnet/csharplang/pull/4486/files#diff-77eda16068bf940c41c3c540d5a0f74c1bfee4b56388b0bc4e59e037abaa9ec7R124-R125. This makes sense. Thank you.
Would implementing TryFormat automatically affect the default ToString's behavior? I didn't see this mentioned in the spec.
struct Point
{
public int X, Y;
public bool TryFormat(Span<char> destination, out in charsWritten, ReadOnlySpan<char> format, IFormatProvider? provider) =>
destination.TryWrite($"X={X} Y={Y}", out charsWritten, provider);
public override string ToString() => $"{this}"; // is this line necessary? or do I repeat it in every file?
}
(Based on an example graciously provided by @stephentoub in https://github.com/dotnet/runtime/issues/50601)
I don't particularly understand what you're asking @miyu. Presumably, the first parameter of TryWrite takes a builder type, and that would format the string however it wants.
https://github.com/dotnet/csharplang/blob/d41fcd0377db1ecccfa8de9185b9aa07f894b342/proposals/improved-interpolated-strings.md#lowering does not say how the compiler generates the int baseLength, int formatHoleCount arguments for the Create call. Is baseLength required to be exactly the total number of characters outside the interpolation holes, or is the compiler allowed to pad baselength to improve the chance that the initial buffer will be large enough? I imagine such padding could be beneficial if the compiler generates calls of the public void TryFormatInterpolationHole<T>(T value, int alignment, string format) method proposed in https://github.com/dotnet/runtime/issues/50601 and notices an unusually large alignment.
Is
baseLengthrequired to be exactly the total number of characters outside the interpolation holes, or is the compiler allowed to padbaselengthto improve the chance that the initial buffer will be large enough?
The exact length. That's why both values are passed, so the underlying API can make its own padding decisions.
@KalleOlaviNiemitalo has a good point, though. The compiler ends up potentially having more information it could pass to the Create method to enable the builder to do better presizing. The builder isn't currently told how big each hole is, and so can only guess at some average, but the compiler may be given more info in advance, e.g. an alignment that provides a minimum length on a particular hole's size. We should consider whether baseLength should instead be the minimum length the compiler guarantees will be required to format, rather than it just being the number of characters in the base strings. (It must, however, never overestimate, or some builders will end up being functionally incorrect.)
I've also been wondering if the compiler might be able to help the builder track past state to inform future use. Imagine if the builder was passed some kind of unique id per call site. It could then use that id if desired to remember the typical length of strings created for that call site in past executions in that process, in order to better presize in the Create method for future iterations. We might be able to approximate this with a CallerMemberName on Create, but that's also likely to be non-unique. (This could also be an unnecessary attempt at over optimization in the typical case that could actually deoptimize.)
I meant the spec doesn't have any wording to disallow a compiler from generating e.g. InterpolatedStringBuilder.Create(-1, -1) regardless of what is in the interpolated string expression. The argument values should be specified rather than only implied by parameter names.
The lowering section still needs to be fleshed out, and that will be based on LDM feedback on other parts of the proposal.
It may already be too late, but can the parameter of AppendLiteral be char in addition to string?
e.g. now,
string s = $"{a}.{b}.{c}.{d}"
is lowered to
handler.AppendFormatted(a);
handler.AppendLiteral(".");
handler.AppendFormatted(b);
handler.AppendLiteral(".");
handler.AppendFormatted(c);
handler.AppendLiteral(".");
handler.AppendFormatted(d);
however can handler.AppendLiteral(".") be changed to handler.AppendLiteral('.') if the literal is a single character and the handler has a char overload?
can the parameter of AppendLiteral be char in addition to string?
Is your concern performance? It won't matter with https://github.com/dotnet/runtime/pull/57217.
Is your concern performance?
Yes.
Is https://github.com/dotnet/runtime/pull/57217 as fast as char overload? Can ldstr "." (var s = ".") be optimized as fast as ldc.i4.s 46 (char c = '.')?
@ufcpp As far as I can tell, it can. In my simplified test code, both versions end up with the same code for the fast path and using mov word ptr [edx+eax*2], 0x2e for the ..
Does that mean the recently-created https://docs.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/ca1834 is now obsolete, or is StringBuilder itself still slower?

No, this optimization only works due to the call site being a literal. With AppendLiteral, that's expected 100% of the time with normal usage. That expectation let's us mark it as aggressive inlining essentially without penalty. That's not the case for StringBuilder; it's very frequently used with variables, which would make aggressively inlining it a bad choice, and adding a fast path for single chars would potentially regress a common case.
We need to consider adding a new parameter that indicates the handler treats string or char constants in the interpolated string in the same way as literal strings there to the constructor of InterpolatedStringHandlerAttribute.
Console.Error.WriteLine(string.Create(CultureInfo.InvariantCulture, $"The value of {nameof(FooBar)} is incorrect: {FooBar.Value}"));
↑nameof(FooBar) can be merged into the adjacent literals here. ("The value of FooBar is incorrect: ") However, some custom handlers may not prefer this behavior.
If the interpolated string is evaluated as string, we can optimize it at the compiler's discretion: dotnet/roslyn#72308 dotnet/roslyn#71801
However, the specification of InterpolatedStringHandlerAttribute must be revised to deal with those that are evaluated as handler types (especially those other than DefaultInterpolatedStringHandler).