sightglass
sightglass copied to clipboard
A benchmark suite and tool to compare different implementations of the same primitives.
@jameysharp mentioned in https://github.com/bytecodealliance/sightglass/issues/202#issuecomment-1269162316 that it would be nice if measurement results were emitted as soon as they were collected. Currently this is not the case: all the measurements for...
Right now, Sightglass uses a single threshold based on a confidence interval computed by Behrens-Fisher to determine whether a sampled statistic shifted between configurations. The result of this is that...
When measuring more than one benchmark, it would be nice to be able to aggregate the results into a single score. One common way to do this is to take...
This change adds the beginnings of a new V8 engine to Sightglass. It uses V8's `libwee8` library as the backing engine and constructs a `libengine.so` in C++ that is compatible...
From #138: > Ah, and one more thought: have we considered any statistical analysis that would look for multi-modal distributions (and warn, at least)? If we see that e.g. half...
In order to report performance results based on PRs, we talked about implementing an HTTP server (e.g. in `crates/server`) that would: - listen for incoming `POST` requests that contain JSON...
From https://github.com/bytecodealliance/sightglass/issues/138: > Observe CPU governor settings when on a known platform (Linux: `/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor` text file, will usually be `ondemand`, we want `performance`) and warn if scaling is turned on...
Most modern CPUs scale their clock frequency according to demand, and this CPU frequency scaling is always a headache when running benchmarks. There are two main dimensions in which this...
From https://github.com/bytecodealliance/sightglass/issues/138: > Interleave benchmark runs appropriately. Right now, it looks like the top-level runner does a batch of runs with one engine, then a batch of runs with another....
This would allow us to get more samples in that much less time (could get, say, ten instantiation and execution samples per compilation) but would also let us stress test...