Benchmarks icon indicating copy to clipboard operation
Benchmarks copied to clipboard

[Windows] Regression in few TechEmpower benchmarks

Open adamsitnik opened this issue 4 years ago • 28 comments
trafficstars

I wanted to provide some nice benchmark results for @maryamariyan who is currently writing the blog post about Microsoft.Extensions* improvements we did for .NET 6.

To my surprise, the CachingPlatform-intel-win has regressed from 290k+ to 270k RPS.

Plaintext-intel-win has also regressed from 540k to 500k:

obraz

I've taken a very quick look at the profiles and noticed something that I've not seen before: a LOT of time spent in very expensive event logging:

obraz

Repro:

git clone https://github.com/aspnet/Benchmarks.git repro
crank --config .\repro\scenarios\platform.benchmarks.yml --scenario caching --profile aspnet-perf-win --application.collect true

@sebastienros @davidfowl : do we know about the regressions and what has caused them? is the expensive logging always enabled, or just when profiling? how can I disable it?

cc @jeffhandley

adamsitnik avatar May 07 '21 09:05 adamsitnik

I'm pretty sure it's only enabled when you turn profiling on.

davidfowl avatar May 07 '21 09:05 davidfowl

What's the stack before eventsource? What method is doing the emitting?

benaadams avatar May 07 '21 10:05 benaadams

@benaadams obraz

adamsitnik avatar May 07 '21 12:05 adamsitnik

Looks like runtime issue? FrameworkEventSource.Log.ThreadPoolEnqueueWorkObject which is gated on loggingEnabled && FrameworkEventSource.Log.IsEnabled() ThreadPoolWorkQueue.cs#L514-L515

Which is set as ThreadPoolWorkQueue.cs#L440-L443

loggingEnabled = FrameworkEventSource.Log.IsEnabled(EventLevel.Verbose, FrameworkEventSource.Keywords.ThreadPool | FrameworkEventSource.Keywords.ThreadTransfer);

I did have issue with VS triggering machine-wide verbose listing on FrameworkEventSource https://github.com/dotnet/runtime/issues/49592; but I assume that's not the issue on Citrine?

benaadams avatar May 07 '21 12:05 benaadams

@adamsitnik the drop in March you are pointing at is due to the INTEL micro code updates for Specter that were pushed by windows update.

sebastienros avatar May 07 '21 13:05 sebastienros

Still probably shouldn't be logging ThreadPool events at EventLevel.Verbose :)

benaadams avatar May 07 '21 14:05 benaadams

How much micro benchmark coverage do we have for Extensions - especially logging?

danmoseley avatar May 07 '21 14:05 danmoseley

How much micro benchmark coverage do we have for Extensions - especially logging?

From the traces this is specifically EventSource which looks like people are running benchmarks for https://github.com/dotnet/runtime/pull/52092, though don't know if they are in the perf repo (as it probably should be logging anything for these events unless it was switched on in Perfview specifically)

benaadams avatar May 07 '21 14:05 benaadams

@sebastienros how can I disable the events to get a clean trace file?

adamsitnik avatar May 07 '21 15:05 adamsitnik

How much micro benchmark coverage do we have for Extensions

I would say that it's OK, we have dedicated micro benchmarks for Caching, DI, Http, Logging and Primitives.

especially logging?

35 benchmarks: https://github.com/dotnet/performance/tree/main/src/benchmarks/micro/libraries/Microsoft.Extensions.Logging

but here the problematic logging is ETW logging, not Microsoft.Extensions.Logging

adamsitnik avatar May 07 '21 15:05 adamsitnik

@adamsitnik if this can be set as perfview arguments, here is the one to change:

--[JOB].collectArguments <arguments>                                     

Native traces arguments, default is "BufferSizeMB=1024;CircularMB=4096;TplEvents=None", other suggested values: "BufferSizeMB=1024;CircularMB=4096;ThreadTime", "BufferSizeMB=1024;CircularMB=4096;GcOnly"

sebastienros avatar May 07 '21 16:05 sebastienros

So the defaults shouldn't be triggering the EventLevel.Verbose, FrameworkEventSource.Keywords.ThreadPool | FrameworkEventSource.Keywords.ThreadTransfer might be accidently triggered?

benaadams avatar May 07 '21 17:05 benaadams

cc @brianrob

davidfowl avatar May 08 '21 07:05 davidfowl

Yeah, this looks like it's due to the fact that profiling is enabled. You can confirm that something else isn't causing this by explicitly disabling these events. With Perfview, try /ClrEvents:None /TPLEvents:None. If it still shows up, then something else is enabling them.

@benaadams, there is an incoming change to address the issue that you saw with VS, that will limit the collection to just devenv.exe.

brianrob avatar May 10 '21 01:05 brianrob

Yeah, this looks like it's due to the fact that profiling is enabled.

Should it be recording every queued ThreadPool item by default? (e.g. verbose); As that will be millions of events per second?

benaadams avatar May 10 '21 01:05 benaadams

No, it should not be. I think that this environment uses a cached version of PerfView that used to do this, but the latest version should not.

brianrob avatar May 10 '21 02:05 brianrob

With Perfview, try /ClrEvents:None /TPLEvents:None

I have tried that:

crank --config .\scenarios\platform.benchmarks.yml --scenario caching --profile aspnet-citrine-win --application.collect true --application.collectArguments "BufferSizeMB=1024;CircularMB=4096;TplEvents=None;ClrEvents=None"

But for some reason, it ends up with more than 90% of broken stacks:

obraz

I've tried to extend the rundown time, but it did not help:

 --application.collectArguments "BufferSizeMB=1024;CircularMB=4096;TplEvents=None;ClrEvents=None;MinRundownTime=120"

All I get are native call stacks.

I think that this environment uses a cached version of PerfView that used to do this, but the latest version should not.

@sebastienros could you please update PerfView to latest version?

adamsitnik avatar May 10 '21 06:05 adamsitnik

Updated from 2.0.66 to 2.0.68.

sebastienros avatar May 10 '21 14:05 sebastienros

@adamsitnik, sorry that's because the command I gave you shutoff unwind info publish. Try /ClrEvents:JitSymbols. Also, it looks like @sebastienros has updated to the latest version of PerfView, so it might be worth just trying the default collection args.

brianrob avatar May 10 '21 14:05 brianrob

Updated from 2.0.66 to 2.0.68

After the update, I get a warning about 90%+ stacks being BROKEN. After solving the symbols, I can see that EventSource.WriteEvent is gone from the trace (I don't know if for real or due to broken stacks), but there is a different problem related to ETW.

More than 17% of total CPU time is spent in System.private.corelib!System.Int32 Interop+Advapi32::EventWriteTransfer_PInvoke(System.Int64, System.Diagnostics.Tracing.EventDescriptor&amp;, System.Guid*, System.Guid*, System.Int32, System.Diagnostics.Tracing.EventProvider+EventData*) which spends most of the time in ntoskrnl!ExRaiseDatatypeMisalignment

@brianrob is this expected?

obraz

The versions:

| .NET Core SDK Version | 6.0.100-preview.5.21260.9       |
| ASP.NET Core Version  | 6.0.0-preview.5.21261.1+743828a |
| .NET Runtime Version  | 6.0.0-preview.5.21260.8+05b646c |

adamsitnik avatar May 11 '21 12:05 adamsitnik

With TplEvents=None;ClrEvents:JitSymbols I still get a complain about 90%+ stacks being BROKEN, but ETW is gone from the trace.

--application.collectArguments "BufferSizeMB=1024;CircularMB=4096;TplEvents=None;ClrEvents:JitSymbols"

adamsitnik avatar May 11 '21 13:05 adamsitnik

@sebastienros could you please increase the timeout? I wanted to check if it reproes on Citrine, but I am getting the following error:

[03:06:17.163] Downloading trace file application.05-11-15-03-53.etl.zip ...
[03:07:57.239] The trace was not captured on the server: System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at System.IO.Compression.DeflateStream.CopyToStream.WriteAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnection.ChunkedEncodingReadStream.CopyToAsyncCore(Stream destination, CancellationToken cancellationToken)
   at System.IO.Compression.DeflateStream.CopyToStream.CopyFromSourceToDestinationAsync()
   at System.Net.Http.DecompressionHandler.DecompressedContent.SerializeToStreamAsync(Stream stream, TransportContext context, CancellationToken cancellationToken)
   at System.Net.Http.HttpContent.LoadIntoBufferAsyncCore(Task serializeToStreamTask, MemoryStream tempBuffer)
   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at Microsoft.Crank.Controller.WebUtils.DownloadFileWithProgressAsync(HttpClient httpClient, String uri, String serverJobUri, String destinationFileName) in /_/src/Microsoft.Crank.Controller/WebUtils.cs:line 52
   at Microsoft.Crank.Controller.JobConnection.DownloadTraceAsync() in /_/src/Microsoft.Crank.Controller/JobConnection.cs:line 665

adamsitnik avatar May 11 '21 13:05 adamsitnik

asp-perf-win didn't have the PerfView update, now it has it.

sebastienros avatar May 11 '21 16:05 sebastienros

@adamsitnik this is my log, can you share yours to understand the timelines?

[09:18:38.405] Stopping job 'load' ...
[09:18:39.523] Deleting job 'load' ...
[09:18:39.570] Stopping job 'application' ...
[09:18:40.721] Server is collecting trace file, this can take up to 1 minute
..........................................................................................................................
[09:20:47.191] Downloading trace file application.05-11-09-18-40.etl.zip ...
445,290 KB / 445,290 KB (100%)
[09:22:25.443] Deleting job 'application' ...
[09:22:25.481] Stopping job 'db' ...
[09:22:27.703] Deleting job 'db' ...

sebastienros avatar May 11 '21 16:05 sebastienros

Tip, use --[JOB].collectStartup false to exclude the cache population phase that happens before the app accepts requests. It takes around 10s and will free up the trace by much I assume.

sebastienros avatar May 11 '21 16:05 sebastienros

I've updated crank and run the benchmarks again. I am still seeing the same issue:

obraz

adamsitnik avatar May 11 '21 16:05 adamsitnik

Tracing is throwing exceptions for some reason? https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntddk/nf-ntddk-exraisedatatypemisalignment

The ExRaiseDatatypeMisalignment routine can be used with structured exception handling to throw a driver-determined exception for a misaligned data type that occurs when a driver processes I/O requests.

benaadams avatar May 11 '21 17:05 benaadams

I've narrowed down the regression to https://github.com/dotnet/runtime/pull/50778 and opened an issue in the runtime repo: https://github.com/dotnet/runtime/issues/52640

I am not closing this issue yet as it would be great to clarify why ExRaiseDatatypeMisalignment happens to take so much time on Windows (cc @brianrob)

adamsitnik avatar May 12 '21 11:05 adamsitnik