String interpolation doesn't allow bracket escapes in format string
Describe the bug Roslyn no longer allows bracket escapes in format strings as of version 4.4.0-6.22561.11 (1ce8866c), inconsistent with the specification
Regular_Interpolation_Format
: ':' Interpolated_Regular_String_Element+
;
fragment Interpolated_Regular_String_Element
: Interpolated_Regular_String_Character
| Simple_Escape_Sequence
| Hexadecimal_Escape_Sequence
| Unicode_Escape_Sequence
| Open_Brace_Escape_Sequence
| Close_Brace_Escape_Sequence
;
Example
using static System.Console;
WriteLine($"{1:a{{}");
WriteLine($"{1:{{b}");
WriteLine($"{1:a{{b}");
WriteLine($"{1:a}}b}");
WriteLine($"{1:{{a}");
https://sharplab.io/#v2:CYLg1APgAgDABFAjANgYgdAYQPYDsDO2ANgKYDcAsAFADqATgJYAuJAMg7iQBQAkARAG9EIAIYCBAXz4BKSrUYt2nXoOHiARlNnV6zNh278hojVrm7FBlcZESJmmeYX7lRtQNuOgA===
Expected behavior Remove escape sequences from the format string
Additional context The code was valid before and some time during .NET 6's development but is no longer valid in .NET 6 because of a breaking change introduced in https://github.com/dotnet/roslyn/pull/57751
.NET 6 — Is that a C# 10 change, then?
It is
@Arthri – You need to report this issue to Roslyn, this repo is only for the C# Standard. Please create an issue with Roslyn and close this one.
@Nigel-Ecma this isn't a compiler bug; we consider it to be a spec bug that interpolation is specified the way that it is. It doesn't match the runtime's implementation of string.Format parsing, and C# 10's handlers were implemented matching how string.Format will actually interpret this. There's quite the long bug trail here (https://github.com/dotnet/roslyn/issues/57750 and https://github.com/dotnet/csharplang/discussions/4361 are both relevant), but in the compiler team's view, the spec needs to be updated to match what actually happens at runtime here.
@333fred – Hi Fred, @Arthi did mention the head of the thread and I did glance at it. What I did before posting was to hand convert @Arthri’s code according to the Standard, producing:
using System;
public class Program
{
public static void Main()
{
Console.WriteLine( string.Format("{0:a{{}", 1) ); // WriteLine($"{1:a{{}");
Console.WriteLine( string.Format("{0:{{b}", 1) ); // WriteLine($"{1:{{b}");
Console.WriteLine( string.Format("{0:a{{b}", 1) ); // WriteLine($"{1:a{{b}");
Console.WriteLine( string.Format("{0:a}}b}", 1) ); // WriteLine($"{1:a}}b}");
Console.WriteLine( string.Format("{0:{{a}", 1) ); // WriteLine($"{1:{{a}");
}
}
And run that through Roslyn 4.4 using dotnetfiddle.net, which produces:
a{
{b
a{b
a}b
{a
Which I assume is what @Arthri was expecting, as does the Standard.
Given your post I've just run the same code through DotNetFiddle’s other options. Matching Roslyn 4.4 is .NET 4.7.2. Producing a runtime error are: .NET Core 3.1, .NET 5, .NET 6, .NET 7 and .NET 8 Preview 3. The error is:
Unhandled exception. System.FormatException: Input string was not in a correct format.
at System.Text.ValueStringBuilder.AppendFormatHelper(IFormatProvider provider, String format, ReadOnlySpan`1 args)
at System.String.FormatHelper(IFormatProvider provider, String format, ReadOnlySpan`1 args)
at System.String.Format(String format, Object arg0)
at Program.Main()
Command terminated by signal 6
So I postulate that the behaviour of string.Format has changed at some point in time and the C# Standard v7 follows what is now the previous behaviour.
I also note that my quick DotNetFiddle test doesn’t match what @Arthri has reported:
The code was valid before and some time during .NET 6's development but is no longer valid in .NET 6 because of a breaking change introduced in https://github.com/dotnet/roslyn/pull/57751
I’m going to guess that might be due to something like not setting langVersion as one would in VS.
As I don’t remember the C# to .NET version table in my head (poor show I know ;-)) I’ve no idea offhand whether the current Standard is correct for C# v7 and this revised string.Format behaviour is for a future version of the Standard, or not.
An, admittedly brief, scan over the thread working backwards from https://github.com/dotnet/roslyn/pull/57751 a change in behaviour of string.Format didn’t seem to be noted (*) so I’m waiting on someone more knowledgable in that area to provide a definitive statement before I even think of looking at the grammar and semantics of string interpolation for the next edition of the Standard.
@Arthri – I'd still recommend you open a matching issue on dotnet/roslyn, don’t close this one, and x-ref the two. That way both groups will be aware and can figure this out between them.
(*) While I postulate the behaviour has changed we must allow for the behaviour of the compilers and not string.Format is what has changed – as the former has at least for .NET 6 and in https://github.com/dotnet/roslyn/pull/57751
I’m going to guess that might be due to something like not setting
langVersionas one would in VS.
Apologies, I forgot to specify how I compiled it here. I checked it against the demos/lowlevelhackathon of Roslyn on SharpLab https://sharplab.io/#v2:EYLgJgpgtg9gzgWgDYwO5IgNwkgFgQwGMBrfAF1xgDsAaMEAagB8ABABgAIWBGANi+4A6AMLU4MDAG4AsACgA6gCcAlmQgAZZVQgAKACQAiAN7cQ+I0YC+BgJQyFKtZu37jpi8Gt25S1Rq26hiZmHl72vk4BrsH4lpaetuGO/i5B7kaxiUA=
I believe the branch was after the string interpolation builder changes but slightly before the bracket format changes. According to SharpLab, it dates to October 14 2021, while the bracket format PR was merged around December
I'd still recommend you open a matching issue on dotnet/roslyn, don’t close this one, and x-ref the two.
@Nigel-Ecma, I don't understand what you're looking to get out of a roslyn issue. There was one, and it was closed after being fixed. At this point, the thing that needs to be updated is the specification. It's certainly a little bit nebulous as to "when" the spec needs to be updated; as I understand it, string.Format changed in .NET Core 3.1, which approximately corresponds to C# 8. As the committee is currently working on the C# 8 specification, and C# 8 is not officially supported on .NET Framework (which has the old parsing behavior), I think that now is an excellent time to update the specification of interpolation formats to match the behavior of the runtime.
An, admittedly brief, scan over the thread working backwards from https://github.com/dotnet/roslyn/pull/57751 a change in behaviour of string.Format didn’t seem to be noted (*) so I’m waiting on someone more knowledgable in that area to provide a definitive statement before I even think of looking at the grammar and semantics of string interpolation for the next edition of the Standard.
I would have expected to see it on https://learn.microsoft.com/en-us/dotnet/core/compatibility/corefx, but I don't. @stephentoub, any idea where we documented the breaking change to how string.Format parses the format part of the specifier?
Is the treatment of }} specified in ECMA-335? If it is, I think the discrepancy deserves a note. I tried to check but my phone lags badly when I try to search in the large XML file.
Yes, Arthri quoted the relevant section above.
@333fred, where was that? In the description https://github.com/dotnet/csharpstandard/issues/986#issue-1988511468, @Arthri quoted the Regular_Interpolation_Format syntax, but that is from ECMA-334 (C#) and not from ECMA-335 (CLI). If ECMA-335 specifies that the System.String.Format method parses }} as an escaped } within a format string, but ECMA-334 is changed to require different parsing in an interpolated string, then I think this difference deserves a note, regardless of whether .NET Core deviates from ECMA-335 in this respect.
Oh, sorry, I misread 334 for 335. I do not believe that 335 specifies the behavior of pretty much any part of the BCL, string.Format included.
Is the treatment of
}}specified in ECMA-335? If it is, I think the discrepancy deserves a note. I tried to check but my phone lags badly when I try to search in the large XML file.
Format specifiers are covered in the description of System.IFormattable but that does not cover brace escapes (this description is informatively included in 334’s C.4). Brace escapes could (I’ve not checked them all!) occur in informative examples in 335.
I thought it might be specified in ECMA-335 Partition IV: Profiles and Libraries.
In CLILibraryTypes.xml, the System.String type has remarks that include a note that describes the syntax of format strings, and an example that shows {{ and }} not adjacent to a format specification. But it does not say that a closing curly bracket could be part of a format specifier. According to IV.7.1, the remarks element is normative, and the example element is informative. <block type="note"> is categorised as "Rendering/Formatting" and apparently inherits the normativeness of the parent remarks element.
The specification of String.Format(IFormatProvider, String, Object[]) in that file has even less detail.
So AFAICT the String.Format behaviour change does not violate ECMA-335 6th edition.
I would have expected to see it on https://learn.microsoft.com/en-us/dotnet/core/compatibility/corefx, but I don't
I would have as well, but we weren't as good then about officially documenting such things.