BenchmarkDotNet icon indicating copy to clipboard operation
BenchmarkDotNet copied to clipboard

Improve memory diagnoser accuracy

Open timcassell opened this issue 1 year ago • 17 comments
trafficstars

This doesn't fix the zero-alloc measurement, but it does fix a bug from the threading diagnoser interfering with the results. The other changes give us the highest confidence that any errant measurements are not our fault. The rest is up to the runtime.

Fixes #1542

timcassell avatar Apr 15 '24 05:04 timcassell

It looks like I may have resolved the zero-alloc issue!

New results (using the same benchmark from #1543):

Method Runtime Allocated
Benchmark1 .NET 6.0 2 B
Benchmark2 .NET 6.0 2 B
Benchmark3 .NET 6.0 2 B
Benchmark4 .NET 6.0 2 B
Benchmark5 .NET 6.0 2 B
Benchmark1 .NET 7.0 -
Benchmark2 .NET 7.0 -
Benchmark3 .NET 7.0 -
Benchmark4 .NET 7.0 -
Benchmark5 .NET 7.0 -
Benchmark1 .NET 8.0 -
Benchmark2 .NET 8.0 -
Benchmark3 .NET 8.0 -
Benchmark4 .NET 8.0 -
Benchmark5 .NET 8.0 -

The resolution ended up being moving the GcStats measurement to its own method, and removing GC.Collect() from the allocation measurement. Doing only 1 of those still wasn't getting the expected results.

I was unable to repro the issue in a plain console application, so I'm not sure why GC.Collect() interferes with the results (I would have reported a bug in the runtime repo if I could).

PTAL @adamsitnik

timcassell avatar Apr 15 '24 13:04 timcassell

I added the thread sleep from #1543 in Core runtimes 3.0 to 6.0, and now I'm getting 0 measurements across the board:

Method Runtime Allocated
Benchmark1 .NET 6.0 -
Benchmark2 .NET 6.0 -
Benchmark3 .NET 6.0 -
Benchmark4 .NET 6.0 -
Benchmark5 .NET 6.0 -
Benchmark1 .NET 7.0 -
Benchmark2 .NET 7.0 -
Benchmark3 .NET 7.0 -
Benchmark4 .NET 7.0 -
Benchmark5 .NET 7.0 -
Benchmark1 .NET 8.0 -
Benchmark2 .NET 8.0 -
Benchmark3 .NET 8.0 -
Benchmark4 .NET 8.0 -
Benchmark5 .NET 8.0 -
Benchmark1 .NET Core 2.1 -
Benchmark2 .NET Core 2.1 -
Benchmark3 .NET Core 2.1 -
Benchmark4 .NET Core 2.1 -
Benchmark5 .NET Core 2.1 -
Benchmark1 .NET Core 3.0 -
Benchmark2 .NET Core 3.0 -
Benchmark3 .NET Core 3.0 -
Benchmark4 .NET Core 3.0 -
Benchmark5 .NET Core 3.0 -

timcassell avatar Apr 22 '24 23:04 timcassell

Hm, still a bit flaky on CI... I think if we can find why the GC.Collect is causing allocations, we will finally solve the flaky measurements. Unfortunately, I don't know how to debug it. I tried using EtwProfiler, but I get an error when I try to open the results (and I'm not even sure if that's the right tool for this). Does EtwProfiler only work with Full Framework?

image

timcassell avatar Apr 24 '24 03:04 timcassell

Okay, I tested again in a plain console app. I copied the GcStats struct and ran the measurement exactly the same using the copy/pasted struct from 2 different assemblies (with the GC.Collect() included in the measurement).

Code

public static void Main()
{
    var bench = new Bench();
    var action = bench.Benchmark1;
    Warmup(action);
    Console.WriteLine($"allocatedBytesFromSelf {MeasureAllocatedBytesFromSelf(action)}");
    Console.WriteLine($"allocatedBytesFromBDN {MeasureAllocatedBytesFromBDN(action)}");
}

[MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.AggressiveOptimization)]
static void Warmup(Action action)
{
    for (int i = 0; i < 100; i++)
    {
        action();
    }
}

static long? MeasureAllocatedBytesFromSelf(Action action)
{
    // Warmup measurement.
    DeadCodeEliminationHelper.KeepAliveWithoutBoxing(ClassLibrary1.GcStats.ReadInitial());
    DeadCodeEliminationHelper.KeepAliveWithoutBoxing(ClassLibrary1.GcStats.ReadFinal());

    Thread.Sleep(500);

    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    return MeasureSelf(action).GetTotalAllocatedBytes(false);
}

[MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.AggressiveOptimization)]
static ClassLibrary1.GcStats MeasureSelf(Action action)
{
    var initial = ClassLibrary1.GcStats.ReadInitial();
    action();
    var final = ClassLibrary1.GcStats.ReadFinal();
    return final - initial;
}

static long? MeasureAllocatedBytesFromBDN(Action action)
{
    // Warmup measurement.
    DeadCodeEliminationHelper.KeepAliveWithoutBoxing(BenchmarkDotNet.Engines.GcStats.ReadInitial());
    DeadCodeEliminationHelper.KeepAliveWithoutBoxing(BenchmarkDotNet.Engines.GcStats.ReadFinal());

    Thread.Sleep(500);

    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    return MeasureBDN(action).GetTotalAllocatedBytes(false);
}

[MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.AggressiveOptimization)]
static GcStats MeasureBDN(Action action)
{
    var initial = BenchmarkDotNet.Engines.GcStats.ReadInitial();
    action();
    var final = BenchmarkDotNet.Engines.GcStats.ReadFinal();
    return final - initial;
}

This prints

allocatedBytesFromSelf 0
allocatedBytesFromBDN 336

There must be something else in the BenchmarkDotNet assembly causing those allocations. But I can't find what it is.

I can only repro in net8.0. net48 measures 0 for both.

[Edit] I even tried Visual Studio's Performance Profiler, and it's not showing me anything. 😞

[Edit2] I didn't copy RuntimeInformation before. I also copied that and then I saw the same behavior. Still not sure how that's causing allocations, though.

timcassell avatar Apr 25 '24 07:04 timcassell

Update: I found the culprit and opened an issue in dotnet/runtime. https://github.com/dotnet/runtime/issues/101536

timcassell avatar Apr 25 '24 10:04 timcassell

I disabled the flaky tests and left a todo to re-enable them when that issue is fixed.

This is as accurate as we can make the memory diagnoser from our side. I think this should be good to merge now, unless you have any further feedback @adamsitnik.

timcassell avatar Apr 25 '24 12:04 timcassell

Ok, I re-enabled the tests and added a finalizer thread blocker to make the tests less flaky (nice idea @jkotas). This does not have any effect on real world measurements, though.

timcassell avatar Apr 25 '24 18:04 timcassell

Wow, even with tiered jit disabled and finalizer thread blocked, the CI is still flaky.

2024-04-28T00:50:55.0136465Z   Failed AllocationQuantumIsNotAnIssueForNetCore21Plus(toolchain: .NET 8.0) [9 s]
2024-04-28T00:50:55.0137358Z   Error Message:
2024-04-28T00:50:55.0137773Z    Assert.Equal() Failure: Values differ
2024-04-28T00:50:55.0138293Z Expected: 88
2024-04-28T00:50:55.0138634Z Actual:   176

I don't know what causes that behavior, but I just moved the MemoryDiagnoserTests to ManualRunning to stabilize the CI.

timcassell avatar Apr 28 '24 02:04 timcassell

Wow, even with tiered jit disabled and finalizer thread blocked, the CI is still flaky.

My guess is that the next problem are the event source background threads.

jkotas avatar Apr 28 '24 02:04 jkotas

My guess is that the next problem are the event source background threads.

Can you provide more details? I couldn't find information about it from google.

timcassell avatar Apr 28 '24 03:04 timcassell

EventSource and EventPipe create background threads that may allocate. These background threads should be only doing work if there is something listening to the events. It turns out that there is pretty much always something listening to .NET runtime events in typical machine configs these days.

jkotas avatar Apr 28 '24 03:04 jkotas

You mentioned disabling that in the other issue. How can we do that?

timcassell avatar Apr 28 '24 03:04 timcassell

Try to add this to your .csproj:

  <ItemGroup>
    <RuntimeHostConfigurationOption Include="System.Diagnostics.Tracing.EventSource.IsSupported" Value="false" />
  </ItemGroup>

jkotas avatar Apr 28 '24 04:04 jkotas

Well, that didn't seem to help. The test failed exactly the same.

timcassell avatar Apr 28 '24 05:04 timcassell

Does the test harness create any child processes to run the actual tests? If yes, does this setting propagate to the child processes?

jkotas avatar Apr 28 '24 05:04 jkotas

Does the test harness create any child processes to run the actual tests? If yes, does this setting propagate to the child processes?

Yes and yes, RuntimeHostConfigurationOption is one of the settings we explicitly copy.

I also verified it by adding this before the allocation measurement

var switchExists = AppContext.TryGetSwitch("System.Diagnostics.Tracing.EventSource.IsSupported", out var isEnabled);
Host.WriteLine($"// EventSource switchExists: {switchExists}, isEnabled: {isEnabled}");

and it printed // EventSource switchExists: True, isEnabled: False.

timcassell avatar Apr 28 '24 05:04 timcassell

@adamsitnik Do you have any more feedback here?

timcassell avatar Jun 07 '24 17:06 timcassell