runtime icon indicating copy to clipboard operation
runtime copied to clipboard

replace ToUtf8 with Encoding.UTF8.GetBytes

Open kasperk81 opened this issue 3 years ago • 6 comments

fixes #75779

kasperk81 avatar Sep 19 '22 13:09 kasperk81

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis See info in area-owners.md if you want to be subscribed.

Issue Details

fixes #75779

Author: kasperk81
Assignees: -
Labels:

area-System.Text.Json

Milestone: -

ghost avatar Sep 19 '22 13:09 ghost

Can we validate performance is as we expect it to be (as good or better than it was before)?

@kasperk81 If you're interested, here are a few instructions on how to benchmark private build performance. The STJ benchmarks can be run using the System.Text.Json* filter in BenchmarkDotNet. I can also do it for you if you prefer.

eiriktsarpalis avatar Sep 19 '22 14:09 eiriktsarpalis

Can we validate performance is as we expect it to be (as good or better than it was before)?

from one set of performance tests, i'm getting mixed results.

summary: better: 5, geomean: 1.087 worse: 4, geomean: 1.115 total diff: 9

Slower diff/base Base Median (ns) Diff Median (ns) Modality
System.Text.Json.Tests.Perf_Basic.WriteBasicUtf8(Formatted: False, SkipValidatio 1.16 754.87 876.72
System.Text.Json.Tests.Perf_Basic.WriteBasicUtf16(Formatted: False, SkipValidati 1.15 863.18 992.44 bimodal
System.Text.Json.Tests.Perf_Basic.WriteBasicUtf8(Formatted: False, SkipValidatio 1.12 1993853.75 222316.66
System.Text.Json.Tests.Perf_Basic.WriteBasicUtf8(Formatted: False, SkipValidatio 1.04 1911530.87 1982514.57
Faster base/diff Base Median (ns) Diff Median (ns) Modality
System.Text.Json.Tests.Perf_Basic.WriteBasicUtf16(Formatted: False, SkipValidati 1.15 832.79 721.12
System.Text.Json.Tests.Perf_Basic.WriteBasicUtf16(Formatted: True, SkipValidatio 1.08 1063.42 980.44
System.Text.Json.Tests.Perf_Basic.WriteBasicUtf16(Formatted: True, SkipValidatio 1.08 1019.21 940.08
System.Text.Json.Tests.Perf_Basic.WriteBasicUtf8(Formatted: True, SkipValidation 1.07 1003.61 941.22
System.Text.Json.Tests.Perf_Basic.WriteBasicUtf8(Formatted: False, SkipValidatio 1.05 710.38 676.76

kasperk81 avatar Sep 19 '22 15:09 kasperk81

@kasperk81 I'd recommend to pass --envVars DOTNET_ReadyToRun:0 to BDN to disable static PGO as your changes changed layout and made the built-in profile stale. I assume we need to reflect it in the docs (https://github.com/dotnet/performance/blob/main/src/benchmarks/micro/README.md#private-runtime-builds)

EgorBo avatar Sep 19 '22 16:09 EgorBo

I'd recommend to pass --envVars DOTNET_ReadyToRun:0

with envVars, it's clear that diff is slower

summary:
worse: 10, geomean: 1.132
total diff: 10

| Slower                                                                           | diff/base | Base Median (ns) | Diff Median (ns) | Modality|
| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| -------- |
| System.Text.Json.Tests.Perf_Basic.WriteBasicUtf16(Formatted: True, SkipValidatio |      1.29 |       2362864.09 |       3058230.01 |         |
| System.Text.Json.Tests.Perf_Basic.WriteBasicUtf8(Formatted: True, SkipValidation |      1.22 |       2414225.03 |       2944031.19 |         |
| System.Text.Json.Tests.Perf_Basic.WriteBasicUtf16(Formatted: False, SkipValidati |      1.21 |           689.49 |           834.19 |         |
| System.Text.Json.Tests.Perf_Basic.WriteBasicUtf16(Formatted: False, SkipValidati |      1.14 |           748.64 |           856.18 |         |
| System.Text.Json.Tests.Perf_Basic.WriteBasicUtf16(Formatted: True, SkipValidatio |      1.12 |           963.79 |          1079.45 | bimodal |
| System.Text.Json.Tests.Perf_Basic.WriteBasicUtf16(Formatted: True, SkipValidatio |      1.10 |           908.93 |           995.58 |         |
| System.Text.Json.Tests.Perf_Basic.WriteBasicUtf16(Formatted: False, SkipValidati |      1.07 |       1838042.24 |       1969624.88 |         |
| System.Text.Json.Tests.Perf_Basic.WriteBasicUtf8(Formatted: False, SkipValidatio |      1.07 |           634.40 |           677.76 |         |
| System.Text.Json.Tests.Perf_Basic.WriteBasicUtf8(Formatted: True, SkipValidation |      1.06 |           893.44 |           947.61 |         |
| System.Text.Json.Tests.Perf_Basic.WriteBasicUtf8(Formatted: False, SkipValidatio |      1.06 |           707.75 |           750.28 |         |

No Faster results for the provided threshold = 2% and noise filter = 0.3 ns.

kasperk81 avatar Sep 19 '22 16:09 kasperk81

Change seems to be introducing a couple of test failures, see:

https://github.com/dotnet/runtime/pull/75834/checks?check_run_id=8439815257

eiriktsarpalis avatar Sep 20 '22 10:09 eiriktsarpalis

This pull request has been automatically marked no-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity.

ghost avatar Oct 21 '22 21:10 ghost

I've analyzed the test failures and made the following discoveries. Unlike other strings passed to Utf8JsonWriter, comment strings are not escaped, by design:

https://github.com/dotnet/runtime/blob/58a1180ad6421c6f69cf307b57b2496c888585f2/src/libraries/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteValues.Comment.cs#L45

As a result, the comment writer supports inputs containing unpaired surrogates:

https://github.com/dotnet/runtime/blob/58a1180ad6421c6f69cf307b57b2496c888585f2/src/libraries/System.Text.Json/tests/System.Text.Json.Tests/Utf8JsonWriterTests.cs#L4165

However, in such cases the current implementation will simply write the comment string up to the impacted character and discard everything else

https://github.com/dotnet/runtime/blob/58a1180ad6421c6f69cf307b57b2496c888585f2/src/libraries/System.Text.Json/tests/System.Text.Json.Tests/Utf8JsonWriterTests.cs#L4200

Even though we're testing for this, I don't think anybody could reasonably depend on the current behavior. I would recommend taking a breaking change and have the method throw ArgumentException instead.

eiriktsarpalis avatar Oct 24 '22 18:10 eiriktsarpalis

string before write: "comment is / * valid even with unpaired surrogate \udc00 this part no longer visible" expected: "comment is / * valid even with unpaired surrogate actual: comment is /* valid*//**//**/

actual doesn't make sense, it replaced 30 expected characters with *//**//**/ and eats the space between forward slash and asterisk in / * valid part.

Encoding.UTF8.GetBytes is lacking something that ToUtf8 was handling well (that's the only thing i've replaced). could be an indication of bug in GetBytes that goes beyond this odd comment?

if i delete that comment from the test, it passes though the other comments like "comment is /* valid" are still there

--- a/src/libraries/System.Text.Json/tests/System.Text.Json.Tests/Utf8JsonWriterTests.cs
+++ b/src/libraries/System.Text.Json/tests/System.Text.Json.Tests/Utf8JsonWriterTests.cs
@@ -4162,10 +4162,6 @@ public void WriteCommentsInvalidTextAllowed(bool formatted, bool skipValidation)
             jsonUtf8.WriteCommentValue(comment.AsSpan());
             jsonUtf8.WriteCommentValue(Encoding.UTF8.GetBytes(comment));
 
-            comment = "comment is / * valid even with unpaired surrogate \udc00 this part no longer visible";
-            jsonUtf8.WriteCommentValue(comment);
-            jsonUtf8.WriteCommentValue(comment.AsSpan());
-
             jsonUtf8.Flush();
 
             // Explicitly skipping flushing here
@@ -4197,10 +4193,6 @@ private static string GetCommentExpectedString(bool prettyPrint)
             json.WriteComment(comment);
             json.WriteComment(comment);
 
-            comment = "comment is / * valid even with unpaired surrogate ";
-            json.WriteComment(comment);
-            json.WriteComment(comment);
-
             json.Flush();
 
             return Encoding.UTF8.GetString(ms.ToArray());

kasperk81 avatar Oct 24 '22 19:10 kasperk81