docs [Breaking change]: In .NET 8 Stream Reader emits Unicode replacement character, .NET 7 did not

Description

When a StreamReader with default constructor (UTF-8) encounters a UTF-8 character that is broken in half (one particular kind of invalid UTF-8 byte sequence), the handling changed from .NET 7 to .NET 8. I wasn't able to find docs mentioning this change.

Repro code:

using System.Runtime.InteropServices;
using System.Text;
using System.Text.Json;

var str = "  \u00B7  ";
var bytes = Encoding.UTF8.GetBytes(str);
Console.WriteLine("Framework: " + RuntimeInformation.FrameworkDescription);
for (var i = 1; i <= bytes.Length; i++)
{
    var range = bytes[0..i];
    var readByStreamReader = new StreamReader(new MemoryStream(range)).ReadToEnd();
    Console.WriteLine(JsonSerializer.Serialize(readByStreamReader));
}

Output in .NET 7 (no replacement character emitted):

Framework: .NET 7.0.14
" "
"  "
"  "
"  \u00B7"
"  \u00B7 "
"  \u00B7  "

Output in .NET 8 (replacement character emitted)

Framework: .NET 8.0.0
" "
"  "
"  \uFFFD"
"  \u00B7"
"  \u00B7 "
"  \u00B7  "

Version

.NET 8 GA

Previous behavior

I noticed this on .NET 8 GA. I did not test .NET 8 previews.

New behavior

A \uFFFD character (Unicode replacement character) is emitted by the StreamReader now. Previously nothing was emitted.

Type of breaking change

[ ] Binary incompatible: Existing binaries may encounter a breaking change in behavior, such as failure to load or execute, and if so, require recompilation.
[ ] Source incompatible: When recompiled using the new SDK or component or to target the new runtime, existing source code may require source changes to compile successfully.
[X] Behavioral change: Existing binaries may behave differently at run time.

Reason for change

Product team can provide details I think.

Recommended action

Document the change.

Feature area

Globalization

Affected APIs

System.IO.StreamReader

Nov 15 '23 18:11 joelverhagen

@stephentoub Can you take a look at this issue and see if it should be documented as a breaking change?

Nov 15 '23 18:11 gewarren

When I ran this code in my Linux and macOS build, it seems this behavior change may be observed differently by users based on their platform or Unicode implementation, e.g. Windows ICU vs NLS.

using System.Globalization;
using System.Runtime.InteropServices;

Console.WriteLine("IcuMode: " + IcuMode());
Console.WriteLine("Framework: " + RuntimeInformation.FrameworkDescription);
Console.WriteLine("Culture EndsWith: " + "Code\uFFFD".EndsWith("Code", StringComparison.CurrentCulture));

static bool IcuMode()
{
    SortVersion sortVersion = CultureInfo.InvariantCulture.CompareInfo.Version;
    byte[] bytes = sortVersion.SortId.ToByteArray();
    int version = bytes[3] << 24 | bytes[2] << 16 | bytes[1] << 8 | bytes[0];
    return version != 0 && version == sortVersion.FullVersion;
}

Windows + ICU enabled:

IcuMode: True
Framework: .NET 8.0.0
Culture EndsWith: False

Linux + macOS + Windows ICU enabled:

IcuMode: True
Framework: .NET 8.0.0
Culture EndsWith: False

IcuMode: False
Framework: .NET 7.0.14
Culture EndsWith: True

To be clear, it seems culture-based handling of the replacement character seems consistent between .NET 7 and .NET 8 (i.e. ICU doesn't ignore the replacement character like NLS) but it means that perhaps the above behavior change is more impactful on ICU runtimes than NLS ones.

I found this with an Assert.EndsWith Xunit assertion. I will just add Ordinal comparison to the assertion to make it consistent cross-plat.

Nov 15 '23 20:11 joelverhagen

@stephentoub Can you take a look at this issue and see if it should be documented as a breaking change?

@GrabYourPitchforks, does this look familiar?

Dec 11 '23 17:12 stephentoub

Likely caused by https://github.com/dotnet/runtime/pull/69888.

Jul 17 '24 07:07 KalleOlaviNiemitalo

docs docs copied to clipboard

[Breaking change]: In .NET 8 Stream Reader emits Unicode replacement character, .NET 7 did not

Description

Version

Previous behavior

New behavior

Type of breaking change

Reason for change

Recommended action

Feature area

Affected APIs

docs
docs copied to clipboard