Benchmarks
Benchmarks copied to clipboard
[Windows] Regression in few TechEmpower benchmarks
I wanted to provide some nice benchmark results for @maryamariyan who is currently writing the blog post about Microsoft.Extensions* improvements we did for .NET 6.
To my surprise, the CachingPlatform-intel-win has regressed from 290k+ to 270k RPS.
Plaintext-intel-win has also regressed from 540k to 500k:

I've taken a very quick look at the profiles and noticed something that I've not seen before: a LOT of time spent in very expensive event logging:

Repro:
git clone https://github.com/aspnet/Benchmarks.git repro
crank --config .\repro\scenarios\platform.benchmarks.yml --scenario caching --profile aspnet-perf-win --application.collect true
@sebastienros @davidfowl : do we know about the regressions and what has caused them? is the expensive logging always enabled, or just when profiling? how can I disable it?
cc @jeffhandley
I'm pretty sure it's only enabled when you turn profiling on.
What's the stack before eventsource? What method is doing the emitting?
@benaadams

Looks like runtime issue? FrameworkEventSource.Log.ThreadPoolEnqueueWorkObject which is gated on loggingEnabled && FrameworkEventSource.Log.IsEnabled() ThreadPoolWorkQueue.cs#L514-L515
Which is set as ThreadPoolWorkQueue.cs#L440-L443
loggingEnabled = FrameworkEventSource.Log.IsEnabled(EventLevel.Verbose, FrameworkEventSource.Keywords.ThreadPool | FrameworkEventSource.Keywords.ThreadTransfer);
I did have issue with VS triggering machine-wide verbose listing on FrameworkEventSource https://github.com/dotnet/runtime/issues/49592; but I assume that's not the issue on Citrine?
@adamsitnik the drop in March you are pointing at is due to the INTEL micro code updates for Specter that were pushed by windows update.
Still probably shouldn't be logging ThreadPool events at EventLevel.Verbose :)
How much micro benchmark coverage do we have for Extensions - especially logging?
How much micro benchmark coverage do we have for Extensions - especially logging?
From the traces this is specifically EventSource which looks like people are running benchmarks for https://github.com/dotnet/runtime/pull/52092, though don't know if they are in the perf repo (as it probably should be logging anything for these events unless it was switched on in Perfview specifically)
@sebastienros how can I disable the events to get a clean trace file?
How much micro benchmark coverage do we have for Extensions
I would say that it's OK, we have dedicated micro benchmarks for Caching, DI, Http, Logging and Primitives.
especially logging?
35 benchmarks: https://github.com/dotnet/performance/tree/main/src/benchmarks/micro/libraries/Microsoft.Extensions.Logging
but here the problematic logging is ETW logging, not Microsoft.Extensions.Logging
@adamsitnik if this can be set as perfview arguments, here is the one to change:
--[JOB].collectArguments <arguments>
Native traces arguments, default is "BufferSizeMB=1024;CircularMB=4096;TplEvents=None", other suggested values: "BufferSizeMB=1024;CircularMB=4096;ThreadTime", "BufferSizeMB=1024;CircularMB=4096;GcOnly"
So the defaults shouldn't be triggering the EventLevel.Verbose, FrameworkEventSource.Keywords.ThreadPool | FrameworkEventSource.Keywords.ThreadTransfer might be accidently triggered?
cc @brianrob
Yeah, this looks like it's due to the fact that profiling is enabled. You can confirm that something else isn't causing this by explicitly disabling these events. With Perfview, try /ClrEvents:None /TPLEvents:None. If it still shows up, then something else is enabling them.
@benaadams, there is an incoming change to address the issue that you saw with VS, that will limit the collection to just devenv.exe.
Yeah, this looks like it's due to the fact that profiling is enabled.
Should it be recording every queued ThreadPool item by default? (e.g. verbose); As that will be millions of events per second?
No, it should not be. I think that this environment uses a cached version of PerfView that used to do this, but the latest version should not.
With Perfview, try /ClrEvents:None /TPLEvents:None
I have tried that:
crank --config .\scenarios\platform.benchmarks.yml --scenario caching --profile aspnet-citrine-win --application.collect true --application.collectArguments "BufferSizeMB=1024;CircularMB=4096;TplEvents=None;ClrEvents=None"
But for some reason, it ends up with more than 90% of broken stacks:

I've tried to extend the rundown time, but it did not help:
--application.collectArguments "BufferSizeMB=1024;CircularMB=4096;TplEvents=None;ClrEvents=None;MinRundownTime=120"
All I get are native call stacks.
I think that this environment uses a cached version of PerfView that used to do this, but the latest version should not.
@sebastienros could you please update PerfView to latest version?
Updated from 2.0.66 to 2.0.68.
@adamsitnik, sorry that's because the command I gave you shutoff unwind info publish. Try /ClrEvents:JitSymbols. Also, it looks like @sebastienros has updated to the latest version of PerfView, so it might be worth just trying the default collection args.
Updated from 2.0.66 to 2.0.68
After the update, I get a warning about 90%+ stacks being BROKEN. After solving the symbols, I can see that EventSource.WriteEvent is gone from the trace (I don't know if for real or due to broken stacks), but there is a different problem related to ETW.
More than 17% of total CPU time is spent in System.private.corelib!System.Int32 Interop+Advapi32::EventWriteTransfer_PInvoke(System.Int64, System.Diagnostics.Tracing.EventDescriptor&, System.Guid*, System.Guid*, System.Int32, System.Diagnostics.Tracing.EventProvider+EventData*) which spends most of the time in ntoskrnl!ExRaiseDatatypeMisalignment
@brianrob is this expected?

The versions:
| .NET Core SDK Version | 6.0.100-preview.5.21260.9 |
| ASP.NET Core Version | 6.0.0-preview.5.21261.1+743828a |
| .NET Runtime Version | 6.0.0-preview.5.21260.8+05b646c |
With TplEvents=None;ClrEvents:JitSymbols I still get a complain about 90%+ stacks being BROKEN, but ETW is gone from the trace.
--application.collectArguments "BufferSizeMB=1024;CircularMB=4096;TplEvents=None;ClrEvents:JitSymbols"
@sebastienros could you please increase the timeout? I wanted to check if it reproes on Citrine, but I am getting the following error:
[03:06:17.163] Downloading trace file application.05-11-15-03-53.etl.zip ...
[03:07:57.239] The trace was not captured on the server: System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.IO.Compression.DeflateStream.CopyToStream.WriteAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken)
at System.Net.Http.HttpConnection.ChunkedEncodingReadStream.CopyToAsyncCore(Stream destination, CancellationToken cancellationToken)
at System.IO.Compression.DeflateStream.CopyToStream.CopyFromSourceToDestinationAsync()
at System.Net.Http.DecompressionHandler.DecompressedContent.SerializeToStreamAsync(Stream stream, TransportContext context, CancellationToken cancellationToken)
at System.Net.Http.HttpContent.LoadIntoBufferAsyncCore(Task serializeToStreamTask, MemoryStream tempBuffer)
at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
at Microsoft.Crank.Controller.WebUtils.DownloadFileWithProgressAsync(HttpClient httpClient, String uri, String serverJobUri, String destinationFileName) in /_/src/Microsoft.Crank.Controller/WebUtils.cs:line 52
at Microsoft.Crank.Controller.JobConnection.DownloadTraceAsync() in /_/src/Microsoft.Crank.Controller/JobConnection.cs:line 665
asp-perf-win didn't have the PerfView update, now it has it.
@adamsitnik this is my log, can you share yours to understand the timelines?
[09:18:38.405] Stopping job 'load' ...
[09:18:39.523] Deleting job 'load' ...
[09:18:39.570] Stopping job 'application' ...
[09:18:40.721] Server is collecting trace file, this can take up to 1 minute
..........................................................................................................................
[09:20:47.191] Downloading trace file application.05-11-09-18-40.etl.zip ...
445,290 KB / 445,290 KB (100%)
[09:22:25.443] Deleting job 'application' ...
[09:22:25.481] Stopping job 'db' ...
[09:22:27.703] Deleting job 'db' ...
Tip, use --[JOB].collectStartup false to exclude the cache population phase that happens before the app accepts requests. It takes around 10s and will free up the trace by much I assume.
I've updated crank and run the benchmarks again. I am still seeing the same issue:

Tracing is throwing exceptions for some reason? https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntddk/nf-ntddk-exraisedatatypemisalignment
The ExRaiseDatatypeMisalignment routine can be used with structured exception handling to throw a driver-determined exception for a misaligned data type that occurs when a driver processes I/O requests.
I've narrowed down the regression to https://github.com/dotnet/runtime/pull/50778 and opened an issue in the runtime repo: https://github.com/dotnet/runtime/issues/52640
I am not closing this issue yet as it would be great to clarify why ExRaiseDatatypeMisalignment happens to take so much time on Windows (cc @brianrob)