Improved interpolated strings

[x] Proposed
[ ] Prototype: Not Started
[ ] Implementation: Not Started
[ ] Specification: Not Started

Split out from https://github.com/dotnet/csharplang/issues/2302.

Specification is here: https://github.com/dotnet/csharplang/blob/main/proposals/csharp-10.0/improved-interpolated-strings.md

LDM Notes:

https://github.com/dotnet/csharplang/blob/main/meetings/2021/LDM-2021-03-01.md#interpolated-string-improvements https://github.com/dotnet/csharplang/blob/main/meetings/2021/LDM-2021-03-15.md#interpolated-string-improvements https://github.com/dotnet/csharplang/blob/main/meetings/2021/LDM-2021-03-24.md#improved-interpolated-strings https://github.com/dotnet/csharplang/blob/main/meetings/2021/LDM-2021-08-25.md

Mar 01 '21 22:03 333fred

I don't understand this part of the spec:

By making the builder an out parameter we allow overloading of the GetInterpolatedStringBuilder method by builder type, which is useful for scenarios like the logger above, which could have TraceLoggerParamsBuilder/DebugLoggerParamsBuilder/WarningLoggerParamsBuilder/etc.

How would the overload occur in practice when the compiler translates the string interpolation expression using out var or equivalent?:

TraceLoggerParamsBuilder.GetInterpolatedStringBuilder(baseLength: 47, formatHoleCount: 1, receiverTemp, out var builder);

How would the compiler know which overload to select if there were multiple? By my reading of the spec, GetInterpolatedStringBuilder would have to be implemented as a static method on 3 different types in order to have TraceLoggerParamsBuilder/DebugLoggerParamsBuilder/WarningLoggerParamsBuilder rather than being able to use an overload.

Mar 02 '21 10:03 yaakov-h

@yaakov-h I used var for convenience there. It would not be var in practice.

Mar 02 '21 16:03 333fred

In that case how would the compiler ever resolve an overload to something other than the type itself?

When an interpolated_string_expression is passed as an argument to a method, we look at the type of the parameter. If the parameter type has a static method GetInterpolatedStringBuilder ... and has an out parameter of the type of original method's parameter

It seems like those two types will always be the same, so where is there room for overloading?

Or would it be the case that a base class could provide overloads for a number of its descendant classes?

Mar 02 '21 21:03 yaakov-h

In that case how would the compiler ever resolve an overload to something other than the type itself?

I'm not sure what you mean. The idea is that a logging provider could provide GetInterpolatedString(..., out TraceLoggerParamsBuilder/InfoLoggerParamsBuilder/etc builder), and then LogTrace(TraceLoggerParamsBuilder builder), LogInfo(InfoLoggerParamsBuilder builder), etc. We'd then know which version of GetInterpolatedString to call based on the builder type.

Mar 02 '21 21:03 333fred

I'm confused then.

My understanding based on my reading of the spec is that the parameter type has to provide GetInterpolatedString, so you would need:

ref struct TraceLoggerParamsBuilder
{
    public static void GetInterpolatedStringBuilder(..., out TraceLoggerParamsBuilder builder) { ... }
}

ref struct InfoLoggerParamsBuilder
{
    public static void GetInterpolatedStringBuilder(..., out InfoLoggerParamsBuilder builder) { ... }
}

ref struct WarnLoggerParamsBuilder
{
    public static void GetInterpolatedStringBuilder(..., out WarnLoggerParamsBuilder builder) { ... }
}

interface ILog
{
    void LogTrace(TraceLoggerParamsBuilder builder);
    void LogInfo(InfoLoggerParamsBuilder builder);
    void LogWarn(WarnLoggerParamsBuilder builder);
}

How could this be restructured such that one class defines all three overloads of GetInterpolatedStringBuilder?

Mar 02 '21 23:03 yaakov-h

How could this be restructured such that one class defines all three overloads of GetInterpolatedStringBuilder?

Could we you attributes instead?

ref struct LogParamsBuilder {
     public static InfoLoggerParamsBuilder GetTraceInterpolatedStringBuilder(...) { ... }
     public static InfoLoggerParamsBuilder GetInfoInterpolatedStringBuilder(...) { ... }
     public static InfoLoggerParamsBuilder GetWarnInterpolatedStringBuilder(...) { ... }
}


interface ILog
{
    void LogTrace([ParamBuilder("GetTraceInterpolatedStringBuilder")] LogParamsBuilder builder);
    void LogInfo([ParamBuilder("GetInfoInterpolatedStringBuilder")] LogParamsBuilder builder);
    void LogWarn([ParamBuilder("GetWarnInterpolatedStringBuilder")] LogParamsBuilder builder);
}

Where the attribute takes the method name of the type. It could infer the type from the parameter type like the current proposal, or optionally take a typeof(LogParamsBuilder ) if for some reason the builder is different from the ParamsBuilder

Is there a performance gain by using an out parameter or was it just for overload resolution? It seems more intuitive to return a value then to have a void method with a single out parameter.

Mar 26 '21 22:03 bjbr-dev

Im not sure on how the compiler would behave here, but it seems easier for the compiler if is a parameter attribute. If its an arbitrary class as a parameter to a method, then presumably every method call has to figure out whether the parameter is a possible param builder. Whereas the attribute will efficiently and obviously indicate this behaviour

Mar 26 '21 22:03 bjbr-dev

then presumably every method call has to figure out whether the parameter is a possible param builder.

Yes, but only in cases where an interpolated string is actually passed as an argument, and only for that particular parameter type. This is not likely to be a very expensive check, I'm not worried about it :).

Mar 26 '21 22:03 333fred

How can this treat item format?

Log($"{i:X4}");

Mar 31 '21 04:03 ufcpp

How can this treat item format?

However the builder wants? It needs to have an overload of TryFormat that accepts the interpolation hole content, which is both i and the string "X4". What happens after that is up to the builder type.

Mar 31 '21 05:03 333fred

I want to avoid string allocation of i.ToString("X4"). As an alternative idea, I want ISpanFormattable to be public.

Span<char> buffer = stackalloc char[4];
((ISpanFormattablei)i).Format(buffer, out var written, "X4");
Log(buffer.AsSpan(0, written));

Mar 31 '21 05:03 ufcpp

Wouldn't the existing design avoid that kind of allocation if you chose to implement it kinda like this?:

public bool TryFormat(int i, string format)
{
    Span<char> buffer = stackalloc char[4];
    bool formatted = i.TryFormat(buffer, out var written, format);
    Debug.Assert(formatted == true);

    Log(buffer.AsSpan(0, written));
}

Mar 31 '21 06:03 yaakov-h

Wouldn't the existing design avoid that kind of allocation if you chose to implement it kinda like this?:
public bool TryFormat(int i, string format)
{
    Span<char> buffer = stackalloc char[4];
    bool formatted = i.Try.Format(buffer, out var written, format);
    Debug.Assert(formatted == true);

    Log(buffer.AsSpan(0, written));
}

Yup. The builder implements the TryFormat method. We never call ToString on any components in this proposal.

Mar 31 '21 06:03 333fred

I found spec https://github.com/dotnet/csharplang/pull/4486/files#diff-77eda16068bf940c41c3c540d5a0f74c1bfee4b56388b0bc4e59e037abaa9ec7R124-R125. This makes sense. Thank you.

Mar 31 '21 16:03 ufcpp

Would implementing TryFormat automatically affect the default ToString's behavior? I didn't see this mentioned in the spec.

struct Point
{
    public int X, Y;
    public bool TryFormat(Span<char> destination, out in charsWritten, ReadOnlySpan<char> format, IFormatProvider? provider) =>
        destination.TryWrite($"X={X} Y={Y}", out charsWritten, provider);
    public override string ToString() => $"{this}"; // is this line necessary? or do I repeat it in every file?
}

(Based on an example graciously provided by @stephentoub in https://github.com/dotnet/runtime/issues/50601)

Apr 02 '21 04:04 miyu

I don't particularly understand what you're asking @miyu. Presumably, the first parameter of TryWrite takes a builder type, and that would format the string however it wants.

Apr 02 '21 05:04 333fred

https://github.com/dotnet/csharplang/blob/d41fcd0377db1ecccfa8de9185b9aa07f894b342/proposals/improved-interpolated-strings.md#lowering does not say how the compiler generates the int baseLength, int formatHoleCount arguments for the Create call. Is baseLength required to be exactly the total number of characters outside the interpolation holes, or is the compiler allowed to pad baselength to improve the chance that the initial buffer will be large enough? I imagine such padding could be beneficial if the compiler generates calls of the public void TryFormatInterpolationHole<T>(T value, int alignment, string format) method proposed in https://github.com/dotnet/runtime/issues/50601 and notices an unusually large alignment.

Apr 03 '21 14:04 KalleOlaviNiemitalo

Is baseLength required to be exactly the total number of characters outside the interpolation holes, or is the compiler allowed to pad baselength to improve the chance that the initial buffer will be large enough?

The exact length. That's why both values are passed, so the underlying API can make its own padding decisions.

Apr 03 '21 20:04 333fred

@KalleOlaviNiemitalo has a good point, though. The compiler ends up potentially having more information it could pass to the Create method to enable the builder to do better presizing. The builder isn't currently told how big each hole is, and so can only guess at some average, but the compiler may be given more info in advance, e.g. an alignment that provides a minimum length on a particular hole's size. We should consider whether baseLength should instead be the minimum length the compiler guarantees will be required to format, rather than it just being the number of characters in the base strings. (It must, however, never overestimate, or some builders will end up being functionally incorrect.)

I've also been wondering if the compiler might be able to help the builder track past state to inform future use. Imagine if the builder was passed some kind of unique id per call site. It could then use that id if desired to remember the typical length of strings created for that call site in past executions in that process, in order to better presize in the Create method for future iterations. We might be able to approximate this with a CallerMemberName on Create, but that's also likely to be non-unique. (This could also be an unnecessary attempt at over optimization in the typical case that could actually deoptimize.)

Apr 03 '21 21:04 stephentoub

I meant the spec doesn't have any wording to disallow a compiler from generating e.g. InterpolatedStringBuilder.Create(-1, -1) regardless of what is in the interpolated string expression. The argument values should be specified rather than only implied by parameter names.

Apr 04 '21 04:04 KalleOlaviNiemitalo

The lowering section still needs to be fleshed out, and that will be based on LDM feedback on other parts of the proposal.

Apr 04 '21 04:04 333fred

It may already be too late, but can the parameter of AppendLiteral be char in addition to string?

e.g. now,

string s = $"{a}.{b}.{c}.{d}"

is lowered to

handler.AppendFormatted(a);
handler.AppendLiteral(".");
handler.AppendFormatted(b);
handler.AppendLiteral(".");
handler.AppendFormatted(c);
handler.AppendLiteral(".");
handler.AppendFormatted(d);

however can handler.AppendLiteral(".") be changed to handler.AppendLiteral('.') if the literal is a single character and the handler has a char overload?

Aug 14 '21 03:08 ufcpp

can the parameter of AppendLiteral be char in addition to string?

Is your concern performance? It won't matter with https://github.com/dotnet/runtime/pull/57217.

Aug 14 '21 10:08 stephentoub

Is your concern performance?

Yes.

Is https://github.com/dotnet/runtime/pull/57217 as fast as char overload? Can ldstr "." (var s = ".") be optimized as fast as ldc.i4.s 46 (char c = '.')?

Aug 14 '21 13:08 ufcpp

@ufcpp As far as I can tell, it can. In my simplified test code, both versions end up with the same code for the fast path and using mov word ptr [edx+eax*2], 0x2e for the ..

Aug 14 '21 15:08 svick

Does that mean the recently-created https://docs.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/ca1834 is now obsolete, or is StringBuilder itself still slower?

Aug 14 '21 15:08 jnm2

No, this optimization only works due to the call site being a literal. With AppendLiteral, that's expected 100% of the time with normal usage. That expectation let's us mark it as aggressive inlining essentially without penalty. That's not the case for StringBuilder; it's very frequently used with variables, which would make aggressively inlining it a bad choice, and adding a fast path for single chars would potentially regress a common case.

Aug 14 '21 15:08 stephentoub

We need to consider adding a new parameter that indicates the handler treats string or char constants in the interpolated string in the same way as literal strings there to the constructor of InterpolatedStringHandlerAttribute.

Console.Error.WriteLine(string.Create(CultureInfo.InvariantCulture, $"The value of {nameof(FooBar)} is incorrect: {FooBar.Value}"));

↑nameof(FooBar) can be merged into the adjacent literals here. ("The value of FooBar is incorrect: ") However, some custom handlers may not prefer this behavior.

If the interpolated string is evaluated as string, we can optimize it at the compiler's discretion: dotnet/roslyn#72308 dotnet/roslyn#71801 However, the specification of InterpolatedStringHandlerAttribute must be revised to deal with those that are evaluated as handler types (especially those other than DefaultInterpolatedStringHandler).

May 06 '24 11:05 tats-u

csharplang
csharplang copied to clipboard

[Proposal]: Improved interpolated strings

Improved interpolated strings

csharplang csharplang copied to clipboard

[Proposal]: Improved interpolated strings

Improved interpolated strings

csharplang
csharplang copied to clipboard