BenchmarkDotNet icon indicating copy to clipboard operation
BenchmarkDotNet copied to clipboard

StackOverflow when adding empty IterationSetup method

Open olstakh opened this issue 5 months ago • 6 comments

Hi. I'm trying to test performance of some ILogger extensions that use a lot of lambdas.

Benchmarks has the following structure

[HtmlExporter]
[MarkdownExporter]
[MemoryDiagnoser]
[MaxColumn, MinColumn, MeanColumn]
public class LoggingBenchmark
{
    MyLogger _logger = new();

    [GlobalSetup]
    public void GlobalSetup()
    {
        _logger = new MyLogger();
    }

    // [IterationSetup]
    public void IterationSetup()
    {
    }

    [Params(1, 10, 100, 1000)]
    public int Depth { get; set; }

    [Benchmark]
    public Task<int> Logger_ExecuteAsync()
    {
        return Helper.RunRecursively_ExecuteAsync(_logger, Depth);
    }
}

The helper method is the equivalent of

Task<int> RunRecursively_ExecuteAsync(ILogger logger, int depth)
{
   return logger.ExecuteAsync(async () =>
   {
       return depth == 0 ? 42 : await RunRecursively_ExecuteAsync(logger, depth - 1);
   })
}

ExecuteAsync is just an extension that does try/catch around async action and logs some stuff

For some reason i'm getting stack overflow exception when uncommenting // [IterationSetup] attribute, even though the method is empty. Any immediate thoughts about the difference, before i invest into making full repro? Looks a bit odd on my end

StackOverflow is obviously for the Depth = 1000 case, runs out of stack arfet 800 or so.

BenchmarkDotNet version: 0.14.0

olstakh avatar May 29 '25 22:05 olstakh

The difference between two runs that i've noticed (with and without IterationSetup attribute) is that with the attribute - invocation count is 1, and so is unroll factor. Not sure if it matters.

Also only without the attribute i can see in the logs WorkloadPilot stage and different amount of operations.

Without attribute:

// Benchmark: LoggingBenchmark.Logger_ExecuteAsync: MediumRun(Toolchain=InProcessEmitToolchain, IterationCount=15, LaunchCount=2, WarmupCount=10) [Depth=500]
// *** Execute ***
// Launch: 1 / 2

// Benchmark Process Environment Information:
// BenchmarkDotNet v0.14.0
// Runtime=.NET 8.0.16 (8.0.1625.21506), X64 RyuJIT AVX2
// GC=Concurrent Workstation
// HardwareIntrinsics=AVX2,AES,BMI1,BMI2,FMA,LZCNT,PCLMUL,POPCNT VectorSize=256
// Job: Job-TQVYQD(Toolchain=InProcessEmitToolchain, IterationCount=15, LaunchCount=2, WarmupCount=10)

OverheadJitting  1: 1 op, 173300.00 ns, 173.3000 us/op
WorkloadJitting  1: 1 op, 1095800.00 ns, 1.0958 ms/op

OverheadJitting  2: 16 op, 391800.00 ns, 24.4875 us/op
WorkloadJitting  2: 16 op, 10248000.00 ns, 640.5000 us/op

WorkloadPilot    1: 16 op, 9558400.00 ns, 597.4000 us/op
WorkloadPilot    2: 32 op, 17678900.00 ns, 552.4656 us/op
WorkloadPilot    3: 64 op, 35436300.00 ns, 553.6922 us/op
WorkloadPilot    4: 128 op, 72322400.00 ns, 565.0188 us/op
WorkloadPilot    5: 256 op, 140979700.00 ns, 550.7020 us/op
WorkloadPilot    6: 512 op, 276452500.00 ns, 539.9463 us/op
WorkloadPilot    7: 1024 op, 560274300.00 ns, 547.1429 us/op

OverheadWarmup   1: 1024 op, 3900.00 ns, 3.8086 ns/op
OverheadWarmup   2: 1024 op, 2100.00 ns, 2.0508 ns/op
OverheadWarmup   3: 1024 op, 2000.00 ns, 1.9531 ns/op
OverheadWarmup   4: 1024 op, 2100.00 ns, 2.0508 ns/op
OverheadWarmup   5: 1024 op, 2100.00 ns, 2.0508 ns/op
OverheadWarmup   6: 1024 op, 2100.00 ns, 2.0508 ns/op
OverheadWarmup   7: 1024 op, 2100.00 ns, 2.0508 ns/op

...

With the IterationSetup attribute present:

// Benchmark: LoggingBenchmark.Logger_ExecuteAsync: MediumRun(Toolchain=InProcessEmitToolchain, InvocationCount=1, IterationCount=15, LaunchCount=2, UnrollFactor=1, WarmupCount=10) [Depth=500]
// *** Execute ***
// Launch: 1 / 2

// Benchmark Process Environment Information:
// BenchmarkDotNet v0.14.0
// Runtime=.NET 8.0.16 (8.0.1625.21506), X64 RyuJIT AVX2
// GC=Concurrent Workstation
// HardwareIntrinsics=AVX2,AES,BMI1,BMI2,FMA,LZCNT,PCLMUL,POPCNT VectorSize=256
// Job: Job-YNTODD(Toolchain=InProcessEmitToolchain, InvocationCount=1, IterationCount=15, LaunchCount=2, UnrollFactor=1, WarmupCount=10)

OverheadJitting  1: 1 op, 171300.00 ns, 171.3000 us/op
WorkloadJitting  1: 1 op, 2898500.00 ns, 2.8985 ms/op

OverheadWarmup   1: 1 op, 1800.00 ns, 1.8000 us/op
OverheadWarmup   2: 1 op, 600.00 ns, 600.0000 ns/op
OverheadWarmup   3: 1 op, 600.00 ns, 600.0000 ns/op
OverheadWarmup   4: 1 op, 400.00 ns, 400.0000 ns/op
OverheadWarmup   5: 1 op, 500.00 ns, 500.0000 ns/op
OverheadWarmup   6: 1 op, 500.00 ns, 500.0000 ns/op

OverheadActual   1: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual   2: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual   3: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual   4: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual   5: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual   6: 1 op, 600.00 ns, 600.0000 ns/op
OverheadActual   7: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual   8: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual   9: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual  10: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual  11: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual  12: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual  13: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual  14: 1 op, 500.00 ns, 500.0000 ns/op
OverheadActual  15: 1 op, 500.00 ns, 500.0000 ns/op

...

Notice different job params, WorkloadPilot stage and different operation count (1024 op vs 1 op) So stack overflow is received in the second case (as well as actual benchmark results being different), all because due to the presence / absence of IterationSetup attribute on an empty method that does nothing.

Curious what i can try to narrow it down and get consistency

olstakh avatar May 29 '25 22:05 olstakh

@olstakh Can you share the full benchmark implementation? I can try to debug it to see what's going on.

timcassell avatar May 31 '25 15:05 timcassell

It currently has a reference to Microsoft.PowerApps.CoreFramework, to which i'm not sure you have access. It has implementation of ExecuteAsync extension. I will spend some time trying to extract relevant code from there to have an isolated code. Meanwhile if you want me to try something on my end - will do.

olstakh avatar May 31 '25 21:05 olstakh

@timcassell here's the repro:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading.Tasks;
using BenchmarkDotNet.Attributes;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;

namespace Engine.BenchmarkTests.Tests;

[InProcess]
[HtmlExporter]
[MarkdownExporter]
[JsonExporterAttribute.Brief]
[MemoryDiagnoser]
[MaxColumn, MinColumn, MeanColumn]
public class LoggingBenchmark
{
    [IterationSetup] // Comment this out and no SO
    public void IterationSetup()
    {
    }

    [Params(1, 1000)] // Remove param 1 and SO regardless of IterationSetup attribute
    public int Depth { get; set; }

    [Benchmark]
    public Task<int> Logger_ExecuteAsync()
    {
        return Helper.RunRecursively_ExecuteAsync(NullLogger.Instance, Depth, 1);
    }
}
static class Helper
{
    public static Task<int> RunRecursively_ExecuteAsync(ILogger logger, int depth, int currentResult)
    {
        return logger.ExecuteAsync2(async () =>
        {
            if (depth == 0)
            {
                return currentResult;
            }
            return await RunRecursively_ExecuteAsync(logger, depth - 1, currentResult + 1);
        });
    }

    public static async Task<TResult> ExecuteAsync2<TResult>(this ILogger logger, Func<Task<TResult>> func)
    {
        return await logger.ExecuteAsync2(300, func);
    }


    public static async Task<T> ExecuteAsync2<T>(this ILogger logger, EventId eventId, Func<Task<T>> func)
    {
        var activityType = "ExecuteAsync2";
        using (logger.BeginScope(new Dictionary<string, object>
        {
            { "ActivityType", activityType.ToString() },
            { "EventId", eventId.Id },
            { "EventName", eventId.Name }
        }))
        {
            Stopwatch stopwatch = Stopwatch.StartNew();
            try
            {
                logger.Log(LogLevel.Information, eventId, "Hi", null, null);
                var result = await func();
                logger.Log(LogLevel.Information, eventId, "Bye", null, null);
                return result;
            }
            catch (Exception exception)
            {
                logger.Log(LogLevel.Information, eventId, "exception", exception, null);
                throw;
            }
        }
    }
}

csproj file:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <TargetFramework>net8.0</TargetFramework>
    <OutputType>Exe</OutputType>
  </PropertyGroup>
  <PropertyGroup>
    <PlatformTarget>AnyCPU</PlatformTarget>
    <DebugType>pdbonly</DebugType>
    <DebugSymbols>true</DebugSymbols>
    <AllowUnsafeBlocks>true</AllowUnsafeBlocks>
    <Optimize>true</Optimize>
    <Configuration>Release</Configuration>
    <IsPackable>false</IsPackable>
  </PropertyGroup>
  <ItemGroup>
    <PackageReference Include="BenchmarkDotNet" VersionOverride="0.15.0" />
    <PackageReference Include="BenchmarkDotNet.Diagnostics.Windows" VersionOverride="0.15.0" />
    <PackageReference Include="Microsoft.Extensions.Logging" VersionOverride="9.0.5" />
    <PackageReference Include="Microsoft.Extensions.Logging.Abstractions" VersionOverride="9.0.5" />
  </ItemGroup>
</Project>

Observations:

  1. Having [IterationSetup] attribute makes benchmarks run very fast (and SO right away)
  2. Not having the attribute takes awhile to run (even for Depth = 1)
  3. Not having Depth = 1 in setup overflows in both cases (with / without an attribute)
  4. [InProcess] emit chain doesn't matter (reproes without it too)

Not sure if machine specs matter, may be 1000 will be different constant for other setups to repro. Here's mine:

BenchmarkDotNet v0.15.0, Windows 11 (10.0.26100.4061/24H2/2024Update/HudsonValley)
12th Gen Intel Core i7-1265U 2.70GHz, 1 CPU, 12 logical and 10 physical cores
.NET SDK 9.0.106
  [Host] : .NET 8.0.16 (8.0.1625.21506), X64 RyuJIT AVX2

olstakh avatar Jun 04 '25 04:06 olstakh

Every recursive method can fail with StackOverflow when the recursion depth is too deep. For a value like 1000 and async method I am not surprised, having an [IterationSetup] may be simply adding one more level of depth.

adamsitnik avatar Jun 04 '25 12:06 adamsitnik

@adamsitnik it does sound reasonable, however

  1. Having [IterationSetup] makes benchmarks behave very different (i.e. if depth is made 500 so no stack overflow, every stage is using 1 op, while without it it's 8K op, so with the attribute this whole test executes within a few seconds, while without it it takes few minutes)
  2. Removing Params(1) (meaning only depth = 1000 is left) makes the run without the attribute also stack overflow. So just adding another parameter lets the benchmark succeed (without the attribute, while still failing with it)

olstakh avatar Jun 04 '25 18:06 olstakh