BenchmarkDotNet
BenchmarkDotNet copied to clipboard
StackOverflow when adding empty IterationSetup method
Hi. I'm trying to test performance of some ILogger extensions that use a lot of lambdas.
Benchmarks has the following structure
[HtmlExporter]
[MarkdownExporter]
[MemoryDiagnoser]
[MaxColumn, MinColumn, MeanColumn]
public class LoggingBenchmark
{
MyLogger _logger = new();
[GlobalSetup]
public void GlobalSetup()
{
_logger = new MyLogger();
}
// [IterationSetup]
public void IterationSetup()
{
}
[Params(1, 10, 100, 1000)]
public int Depth { get; set; }
[Benchmark]
public Task<int> Logger_ExecuteAsync()
{
return Helper.RunRecursively_ExecuteAsync(_logger, Depth);
}
}
The helper method is the equivalent of
Task<int> RunRecursively_ExecuteAsync(ILogger logger, int depth)
{
return logger.ExecuteAsync(async () =>
{
return depth == 0 ? 42 : await RunRecursively_ExecuteAsync(logger, depth - 1);
})
}
ExecuteAsync is just an extension that does try/catch around async action and logs some stuff
For some reason i'm getting stack overflow exception when uncommenting // [IterationSetup] attribute, even though the method is empty.
Any immediate thoughts about the difference, before i invest into making full repro? Looks a bit odd on my end
StackOverflow is obviously for the Depth = 1000 case, runs out of stack arfet 800 or so.
BenchmarkDotNet version: 0.14.0
The difference between two runs that i've noticed (with and without IterationSetup attribute) is that with the attribute - invocation count is 1, and so is unroll factor. Not sure if it matters.
Also only without the attribute i can see in the logs WorkloadPilot stage and different amount of operations.
Without attribute:
// Benchmark: LoggingBenchmark.Logger_ExecuteAsync: MediumRun(Toolchain=InProcessEmitToolchain, IterationCount=15, LaunchCount=2, WarmupCount=10) [Depth=500]
// *** Execute ***
// Launch: 1 / 2
// Benchmark Process Environment Information:
// BenchmarkDotNet v0.14.0
// Runtime=.NET 8.0.16 (8.0.1625.21506), X64 RyuJIT AVX2
// GC=Concurrent Workstation
// HardwareIntrinsics=AVX2,AES,BMI1,BMI2,FMA,LZCNT,PCLMUL,POPCNT VectorSize=256
// Job: Job-TQVYQD(Toolchain=InProcessEmitToolchain, IterationCount=15, LaunchCount=2, WarmupCount=10)
OverheadJitting 1: 1 op, 173300.00 ns, 173.3000 us/op
WorkloadJitting 1: 1 op, 1095800.00 ns, 1.0958 ms/op
OverheadJitting 2: 16 op, 391800.00 ns, 24.4875 us/op
WorkloadJitting 2: 16 op, 10248000.00 ns, 640.5000 us/op
WorkloadPilot 1: 16 op, 9558400.00 ns, 597.4000 us/op
WorkloadPilot 2: 32 op, 17678900.00 ns, 552.4656 us/op
WorkloadPilot 3: 64 op, 35436300.00 ns, 553.6922 us/op
WorkloadPilot 4: 128 op, 72322400.00 ns, 565.0188 us/op
WorkloadPilot 5: 256 op, 140979700.00 ns, 550.7020 us/op
WorkloadPilot 6: 512 op, 276452500.00 ns, 539.9463 us/op
WorkloadPilot 7: 1024 op, 560274300.00 ns, 547.1429 us/op
OverheadWarmup 1: 1024 op, 3900.00 ns, 3.8086 ns/op
OverheadWarmup 2: 1024 op, 2100.00 ns, 2.0508 ns/op
OverheadWarmup 3: 1024 op, 2000.00 ns, 1.9531 ns/op
OverheadWarmup 4: 1024 op, 2100.00 ns, 2.0508 ns/op
OverheadWarmup 5: 1024 op, 2100.00 ns, 2.0508 ns/op
OverheadWarmup 6: 1024 op, 2100.00 ns, 2.0508 ns/op
OverheadWarmup 7: 1024 op, 2100.00 ns, 2.0508 ns/op
...
With the IterationSetup attribute present:
// Benchmark: LoggingBenchmark.Logger_ExecuteAsync: MediumRun(Toolchain=InProcessEmitToolchain, InvocationCount=1, IterationCount=15, LaunchCount=2, UnrollFactor=1, WarmupCount=10) [Depth=500]
// *** Execute ***
// Launch: 1 / 2
// Benchmark Process Environment Information:
// BenchmarkDotNet v0.14.0
// Runtime=.NET 8.0.16 (8.0.1625.21506), X64 RyuJIT AVX2
// GC=Concurrent Workstation
// HardwareIntrinsics=AVX2,AES,BMI1,BMI2,FMA,LZCNT,PCLMUL,POPCNT VectorSize=256
// Job: Job-YNTODD(Toolchain=InProcessEmitToolchain, InvocationCount=1, IterationCount=15, LaunchCount=2, UnrollFactor=1, WarmupCount=10)
OverheadJitting 1: 1 op, 171300.00 ns, 171.3000 us/op
WorkloadJitting 1: 1 op, 2898500.00 ns, 2.8985 ms/op
OverheadWarmup 1: 1 op, 1800.00 ns, 1.8000 us/op
OverheadWarmup 2: 1 op, 600.00 ns, 600.0000 ns/op
OverheadWarmup 3: 1 op, 600.00 ns, 600.0000 ns/op
OverheadWarmup 4: 1 op, 400.00 ns, 400.0000 ns/op
OverheadWarmup 5: 1 op, 500.00 ns, 500.0000 ns/op
OverheadWarmup 6: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual 1: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual 2: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual 3: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual 4: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual 5: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual 6: 1 op, 600.00 ns, 600.0000 ns/op
OverheadActual 7: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual 8: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual 9: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual 10: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual 11: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual 12: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual 13: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual 14: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual 15: 1 op, 500.00 ns, 500.0000 ns/op
...
Notice different job params, WorkloadPilot stage and different operation count (1024 op vs 1 op)
So stack overflow is received in the second case (as well as actual benchmark results being different), all because due to the presence / absence of IterationSetup attribute on an empty method that does nothing.
Curious what i can try to narrow it down and get consistency
@olstakh Can you share the full benchmark implementation? I can try to debug it to see what's going on.
It currently has a reference to Microsoft.PowerApps.CoreFramework, to which i'm not sure you have access. It has implementation of ExecuteAsync extension.
I will spend some time trying to extract relevant code from there to have an isolated code.
Meanwhile if you want me to try something on my end - will do.
@timcassell here's the repro:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading.Tasks;
using BenchmarkDotNet.Attributes;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
namespace Engine.BenchmarkTests.Tests;
[InProcess]
[HtmlExporter]
[MarkdownExporter]
[JsonExporterAttribute.Brief]
[MemoryDiagnoser]
[MaxColumn, MinColumn, MeanColumn]
public class LoggingBenchmark
{
[IterationSetup] // Comment this out and no SO
public void IterationSetup()
{
}
[Params(1, 1000)] // Remove param 1 and SO regardless of IterationSetup attribute
public int Depth { get; set; }
[Benchmark]
public Task<int> Logger_ExecuteAsync()
{
return Helper.RunRecursively_ExecuteAsync(NullLogger.Instance, Depth, 1);
}
}
static class Helper
{
public static Task<int> RunRecursively_ExecuteAsync(ILogger logger, int depth, int currentResult)
{
return logger.ExecuteAsync2(async () =>
{
if (depth == 0)
{
return currentResult;
}
return await RunRecursively_ExecuteAsync(logger, depth - 1, currentResult + 1);
});
}
public static async Task<TResult> ExecuteAsync2<TResult>(this ILogger logger, Func<Task<TResult>> func)
{
return await logger.ExecuteAsync2(300, func);
}
public static async Task<T> ExecuteAsync2<T>(this ILogger logger, EventId eventId, Func<Task<T>> func)
{
var activityType = "ExecuteAsync2";
using (logger.BeginScope(new Dictionary<string, object>
{
{ "ActivityType", activityType.ToString() },
{ "EventId", eventId.Id },
{ "EventName", eventId.Name }
}))
{
Stopwatch stopwatch = Stopwatch.StartNew();
try
{
logger.Log(LogLevel.Information, eventId, "Hi", null, null);
var result = await func();
logger.Log(LogLevel.Information, eventId, "Bye", null, null);
return result;
}
catch (Exception exception)
{
logger.Log(LogLevel.Information, eventId, "exception", exception, null);
throw;
}
}
}
}
csproj file:
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>net8.0</TargetFramework>
<OutputType>Exe</OutputType>
</PropertyGroup>
<PropertyGroup>
<PlatformTarget>AnyCPU</PlatformTarget>
<DebugType>pdbonly</DebugType>
<DebugSymbols>true</DebugSymbols>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
<Optimize>true</Optimize>
<Configuration>Release</Configuration>
<IsPackable>false</IsPackable>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="BenchmarkDotNet" VersionOverride="0.15.0" />
<PackageReference Include="BenchmarkDotNet.Diagnostics.Windows" VersionOverride="0.15.0" />
<PackageReference Include="Microsoft.Extensions.Logging" VersionOverride="9.0.5" />
<PackageReference Include="Microsoft.Extensions.Logging.Abstractions" VersionOverride="9.0.5" />
</ItemGroup>
</Project>
Observations:
- Having
[IterationSetup]attribute makes benchmarks run very fast (and SO right away) - Not having the attribute takes awhile to run (even for
Depth = 1) - Not having
Depth = 1in setup overflows in both cases (with / without an attribute) [InProcess]emit chain doesn't matter (reproes without it too)
Not sure if machine specs matter, may be 1000 will be different constant for other setups to repro.
Here's mine:
BenchmarkDotNet v0.15.0, Windows 11 (10.0.26100.4061/24H2/2024Update/HudsonValley)
12th Gen Intel Core i7-1265U 2.70GHz, 1 CPU, 12 logical and 10 physical cores
.NET SDK 9.0.106
[Host] : .NET 8.0.16 (8.0.1625.21506), X64 RyuJIT AVX2
Every recursive method can fail with StackOverflow when the recursion depth is too deep. For a value like 1000 and async method I am not surprised, having an [IterationSetup] may be simply adding one more level of depth.
@adamsitnik it does sound reasonable, however
- Having
[IterationSetup]makes benchmarks behave very different (i.e. ifdepthis made500so no stack overflow, every stage is using1 op, while without it it's8K op, so with the attribute this whole test executes within a few seconds, while without it it takes few minutes) - Removing
Params(1)(meaning onlydepth = 1000is left) makes the run without the attribute also stack overflow. So just adding another parameter lets the benchmark succeed (without the attribute, while still failing with it)