BenchmarkDotNet icon indicating copy to clipboard operation
BenchmarkDotNet copied to clipboard

Summary Save/Load/Combine

Open AndreyAkinshin opened this issue 9 years ago • 10 comments

Now BenchmarkDotNet allows to create a nice set of jobs for different environments. However, sometimes it's impossible to get all desired result with a single run. Examples:

  • Comparing performance before and after refactoring
  • Comparing performance between different version of a NuGet package (probably will be implemented in the future, but it's not available for now)
  • Comparing performance between different OS (Windows/Linux/MacOS)

So, I suggest to add Save/Load/Combine methods which will help to solve this problems. An example:

// Part 1: WIndows
var summary = BenchmarkRunner.Run<MyBench>();
summary.Save("windows.json");
// Part 2: Linux
var summary = BenchmarkRunner.Run<MyBench>();
summary.Save("linux.json")
// Part 3: Common summary table
var summaryWindows = Summary.Load("windows.json");
var summaryLinux   = Summary.Load("linux.json");
var summary = Summary.Combine(summaryWindows, summaryLinux);

Some additional thoughts / points for discussions.

  • Is it ok to save summary in the json format?
  • Probably it would be nice to have an API for adding additional columns to the current summary like var newSummary = mySummary.WithAdditionalColumn("VersionOfMyLibrary", "0.10.0").

@adamsitnik, @mattwarren, @ig-sinicyn, @terrajobst, what do you think?

AndreyAkinshin avatar Nov 23 '16 06:11 AndreyAkinshin

@AndreyAkinshin, please, not JSON. There's standard format for storing and exchanging the data - XML. It has good conventions for value escaping and date representation. It has strong specification without silly logical errors such as undetermined behavior for duplicate keys. There's API for it out of the box, no additional dependencies required.

Why JSON? About API: why not expose it as an Exporter?

Oh, and now goes the worst part: how the Combine() should work at all? Averaging? No way.

Case 1.

Imagine two benchmark methods, A and B

On PC1:
`A` - 1000ns (Vector.HardwareAccelerated` is false); `B` - 100 ns

On PC2
`A` - 150ns (Vector.HardwareAccelerated` is true); `B` - 300 ns

now what? The bad news: we have no need to imagine it. It's a real case we were faced a year ago.

Case 2.

On PC1:
`A` - 200ns

On PC2
`B` - 150ns

how should these be merged at all? There's no common baseline for it so we have no idea what the actual ratio is. It can be like

On PC2
`B` - 150ns; A - 50ns
-or-
`B` - 150ns; A - 2000ns

the only way to compare them is to calculate relative-to-baseline times. And then entire summary can be safely shortened to

<CompetitionBenchmarks>
	<Competition Target="CodeJam.Examples.SimplePerfTest, CodeJam.PerfTests-Tests.NUnit">
		<Candidate Target="Baseline" Baseline="true" />
		<Candidate Target="SlowerX3" MinRatio="2.91" MaxRatio="3.09" />
		<Candidate Target="SlowerX5" MinRatio="4.85" MaxRatio="5.15" />
		<Candidate Target="SlowerX7" MinRatio="6.79" MaxRatio="7.21" />
	</Competition>
</CompetitionBenchmarks>

and yep, I've did it an it works:)

ig-sinicyn avatar Nov 23 '16 06:11 ig-sinicyn

please, not JSON.

Ok, let's use XML. =)

About API: why not expose it as an Exporter?

Sounds good to me. Thus, we also have to define an Importer.

Oh, and now goes the worst part: how the Combine() should work at all?

All of the examples in the issue are about single PC. You are absolutely right, it's really hard to compare performance numbers across different computers. Probably, we should check combined environments and prints some warnings.

AndreyAkinshin avatar Nov 23 '16 07:11 AndreyAkinshin

Thus, we also have to define an Importer.

Well, maybe :)

All of the examples in the issue are about single PC.

The same issues do apply to the single PC case. The timings may change due to IO latency, FW upgrade, upgrade of the BDN (future roslyn upgrades may bring some optimizations like this). There's no point to keep absolute timings until you preserve the context too. And then it's better to store it as '.csv' or any another tabular format and ETL it into any datamining service.

The summary is the following:

  • if we want to have a simple, machine and human-readable summary that can be compared across multiple runs or machines - use xml like in my previous post.
  • if we want to collect all measurements for data mining - store it in tabular format.

Actually, I've did both (there's CsvTimingsExporter but I have a thought about switching to something like SQLite one day).

ig-sinicyn avatar Nov 23 '16 07:11 ig-sinicyn

Personally what I would like to add is to compare few different versions of nuget package.

sth like:

.Add(Job.WithNuget("System.Slices", version: 1.10.0))
.Add(Job.WithNuget("System.Slices", version: 1.20.0))

and

.Add(Job.WithNuget("System.Slices", version: Version.Latest))

so people could write some unit tests for having no performance drops

as for the format, we would have to provide a mechanism like Sinks in SeriLog: we provide interface (IExporter/IImporter) and people implement it for Xml, MS Sql, Sql Lite, RavenDb etc

adamsitnik avatar Nov 23 '16 11:11 adamsitnik

@adamsitnik

the same-time run, as in

.Add(Job.WithNuget("System.Slices", version: 1.10.0))
.Add(Job.WithNuget("System.Slices", version: 1.20.0))

do not require export / import at all as the BenchmarkConverter.TypeToBenchmarks(type, config) produces separate benchmark for every combination of job/parameters.

ig-sinicyn avatar Nov 23 '16 11:11 ig-sinicyn

do not require export

yes exactly, so we eliminate problems with storage / different PC conditions but on the other hand it will take two times more time ;)

adamsitnik avatar Nov 23 '16 11:11 adamsitnik

So is there a way to store and load?

forki avatar Oct 15 '17 09:10 forki

@forki, you can export results, but there is no way to import it for now. I hope, this feature will be implemented in the nearest future.

AndreyAkinshin avatar Nov 16 '17 20:11 AndreyAkinshin

As an addition to the request, there is a package with storing information regarding tests - https://github.com/approvals/ApprovalTests.Net

All tests are stored as text tables.

Lonli-Lokli avatar Nov 27 '19 12:11 Lonli-Lokli

Hi, I need this feature, but I don't want to make any breaking changes. I would like to create a SnapshotToolchain which implements IToolchain and SnapshotExporter which implements IExporter.

I would like to use it like this:

namespace BenchmarkDotNet.Samples
{
    [Config(typeof(Config))]
    [XmlSnapshotExporter]
    public class TheClassWithBenchmarks
    {
        private class Config : ManualConfig
        {
            public Config()
            {
                AddJob(Job.MediumRun);
                AddJob(Job.MediumRun
                    .WithToolchain(SnapshotToolchain.FromXml("path"))
                    .WithId("Shapshot"));
            }
        }

       ...
    }

Do you think it would work?

workgroupengineering avatar Jan 03 '22 15:01 workgroupengineering