BenchmarkDotNet icon indicating copy to clipboard operation
BenchmarkDotNet copied to clipboard

Wildly different results from simply swapping the order of benchmarks

Open space-alien opened this issue 2 years ago • 6 comments

Faced with seemingly inconsistent and inscrutable benchmark results, I followed a hunch and swapped the order of appearance of my two benchmark methods.

The following results are consistently repeatable.

A then B:

|           Method |     Mean |     Error |    StdDev | Ratio |
|----------------- |---------:|----------:|----------:|------:|
|          MethodA | 1.541 ns | 0.0033 ns | 0.0029 ns |  1.00 |
|          MethodB | 3.451 ns | 0.0079 ns | 0.0074 ns |  2.24 |

Swap the order:

|           Method |      Mean |     Error |    StdDev | Ratio | RatioSD |
|----------------- |----------:|----------:|----------:|------:|--------:|
|          MethodB | 4.9731 ns | 0.0951 ns | 0.1057 ns | 21.52 |    0.64 |
|          MethodA | 0.2302 ns | 0.0024 ns | 0.0020 ns |  1.00 |    0.00 |

Nothing has changed between these two runs, aside from swapping the order of the two benchmark methods on the benchmark class.

How should I diagnose what is going on here?

Environment info:
BenchmarkDotNet=v0.13.1
AMD Ryzen 9 5900HX with Radeon Graphics, 1 CPU, 16 logical and 8 physical cores
.NET SDK=6.0.202
  [Host]     : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT
  Job-MPHSCA : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT

Runtime=.NET 6.0

I'm running Windows 11, but the output reports Windows 10, so I have stripped that.

space-alien avatar May 11 '22 14:05 space-alien

@space-alien thanks for the report. Could you please share the source code of these benchmarks?

AndreyAkinshin avatar May 11 '22 14:05 AndreyAkinshin

Apologies for the delay; I have been investigating further with a view to producing a minimal repro. However, I think I have identified the issue.

My benchmark-holding class has a static member, whose type has a static constructor that performs some non-trivial initialization.

The type's static constructor is being invoked during the benchmarking runs, skewing the results.

This can be solved by taking control of when the static member is initialized, e.g. by adding the following to the benchmark-holding class:

[GlobalSetup]
public void Warmup()
{
    Type myClass = typeof(ClassWithSlowStaticConstructor);
    System.Runtime.CompilerServices.RuntimeHelpers.RunClassConstructor(myClass.TypeHandle);
}

In my case, using a static member on my benchmark class was unnecessary. Not sure whether this kind of scenario might warrant a warning in the docs?

space-alien avatar May 24 '22 14:05 space-alien

Static constructors are only ran once. So only the first warm-up iteration should measure the impact and subsequent iterations should not. Are you measuring a cold start? A regular run should not include static constructors in the results.

timcassell avatar May 24 '22 15:05 timcassell

That's what I would have expected, and was confusing me too.

I fiddled around a bit more and I now have a repro, although I may have inadvertently stumbled into a different issue...

A slow static constructor seems to break the automatic selection of number of iterations in the pilot phase. In the code below, a microbenchmark only gets benchmarked over a single op, and the results are of course useless. A delay of >10s seems to be the point at which the pilot phase switches to just a single op for the benchmark runs.

Perhaps the pilot phase could handle the possibility of a slow static constructor when determining the optimal number of iterations to run. For example, by ignoring the time taken by the first invocation, or detecting if subsequent invocations take vastly less time, and increasing the number of ops.

Now... while this is interesting, I'm pretty sure this is NOT the cause of the issue I originally experienced - I would have noticed if my method was only being invoked once, and the static constructor in my original code certainly did not take so long to run. I will continue to investigate.

using BenchmarkDotNet.Attributes;

namespace BenchmarkDotNetRepro2;

[DisassemblyDiagnoser]
public class MicroBenchmark
{
    // Instance member:
    ulong _instanceCount;

    // Static member:
    //   Initialized when first accessed, which occurs before the start of the second benchmark: StaticMember()
    //   The initializer for the static field `_unusedStaticField` gets called at the same time.
    //   Which, in turn, invokes the SlowWarmupClass static constructor.
    static ulong _staticCount;

    // If this field is not static, it is instantiated along with the MicroBenchmark class,
    // and the slow static constructor will not affect benchmarking.
    static readonly SlowWarmupClass _unusedStaticField = new();

    [Benchmark]
    public ulong InstanceMember()
    {
        _instanceCount++;
        return _instanceCount;
    }

    [Benchmark]
    public ulong StaticMember()
    {
        _staticCount++;
        return _staticCount;
    }
}

public class SlowWarmupClass
{
    static SlowWarmupClass()
    {
        Console.WriteLine(" <<<<   SlowWarmup static constructor   >>>> ");
        Thread.Sleep(11000); // Simulate non-trivial initialization
    }
}

space-alien avatar May 24 '22 19:05 space-alien

PS: I updated to 0.13.1.1796 on the BDN nightly feed, and the threshold above which we only get one op appears to have increased to 10s, although the issue still remains. I've updated the repro above (and the following comment above):

Perhaps the pilot phase could handle the possibility of a slow static constructor when determining the optimal number of iterations to run. For example, by ignoring the time taken by the first invocation, or detecting if subsequent invocations take vastly less time, and increasing the number of ops.

space-alien avatar May 24 '22 19:05 space-alien

Seems related to #1210

timcassell avatar Aug 16 '23 02:08 timcassell