nix icon indicating copy to clipboard operation
nix copied to clipboard

Continuous benchmarks

Open thufschmitt opened this issue 3 years ago • 8 comments

Is your feature request related to a problem? Please describe.

The Nix evaluator is very performance-sensitive (getting in the hot path in a lot of use-cases), however it’s never tracked, leading to issues such as https://github.com/NixOS/nix/issues/4847 (and making changes like https://github.com/NixOS/nix/pull/4511 more painful because there’s no standard way to evaluate the performance changes).

Describe the solution you'd like

In my ideal dreamed world:

  1. A standard set of benchmarks that could be run with make bench and nix build .#benchmarks
  2. A CI job to run them on a regular basis
  3. A nice dashboard reporting the results of the benchmarks through the git history of the project so that we can quickly track performance regressions

thufschmitt avatar Jun 09 '21 06:06 thufschmitt

I marked this as stale due to inactivity. → More info

stale[bot] avatar Jan 03 '22 21:01 stale[bot]

Still relevant.

picnoir avatar Jan 07 '22 11:01 picnoir

Related: https://github.com/NixOS/nix/pull/5978

thufschmitt avatar Jan 25 '22 10:01 thufschmitt

This shows the charts from the various Nixpkgs/NixOS evaluation tests: https://hydra.nixos.org/job/nix/master/metrics.nixpkgs#tabs-charts. Obviously not many data points yet.

edolstra avatar Jan 25 '22 11:01 edolstra

This shows the charts from the various Nixpkgs/NixOS evaluation tests: https://hydra.nixos.org/job/nix/master/metrics.nixpkgs#tabs-charts. Obviously not many data points yet.

That’s pretty awesome, didn’t know hydra had such a feature! Are these also running in a fixed dedicated machine to ensure the reliability of the resource-dependent metrics?

thufschmitt avatar Jan 25 '22 12:01 thufschmitt

Yes, it's a machine provided by @vcunat that runs only one job at a time.

edolstra avatar Jan 25 '22 15:01 edolstra

The dedicated machine is idle almost all the time, so feel free to find more of sensible utilization for it. The main limitation of the current HW is that it can't be upgraded beyond the current 8 GB of RAM, but I hope that can suffice.

vcunat avatar Jan 25 '22 18:01 vcunat

Ah and I still don't know a good mechanism to ensure that no job will meet with GC, but that noise should be very rare currently.

vcunat avatar Jan 25 '22 18:01 vcunat

Would we want be able to catch performance regressions in CI?

If so, I've been working on a continuous benchmarking tool called Bencher: https://github.com/bencherdev/bencher I would be more than happy to build an adapter to nix to track results. The charts would looks similar to what is in hydra at the moment. We would also be able to set statistical thresholds and fail a PR if it contained a performance regression.

epompeii avatar Apr 17 '23 13:04 epompeii

Discussed during the Nix maintainers meeting on 2024-02-12. We already have some basic benchmark tracing in Hydra (https://hydra.nixos.org/job/nix/master/metrics.nixpkgs#tabs-charts), which would be good-enough for a first version. The two bottlenecks here are:

  • Documenting that (the hydra jobset above isn't documented anywhere except in this issue, yet it's already very valuable)
  • Getting some reliable benchmarks running in Hydra (This gist by @pennae seems like a great starting point)

This is open for contributions, and I'd be happy to give some guidance if anyone wants to help.

  • Might arrive with the Hercules CI setup
  • @thufschmitt: Do we want something that we can have right now (GH actions)?
  • Could also expand the current benchmark suite on Hydra
    • https://hydra.nixos.org/job/nix/master/metrics.nixpkgs#tabs-charts for the current benchmarks
  • Open for contributions. @pennae has some methodology shared here

TODO (@thufschmitt): Update the issue to suggest the Hydra way and check the documentation of the jobset

thufschmitt avatar Feb 14 '24 12:02 thufschmitt

@epompeii sorry for not answering earlier. Bencher looks great, and might be something we want to look at in the future. We probably wouldn't even need a dedicated adapter as the benchmarks would probably run under hyperfine or a generic c++ benchmark framework). It's not our priority though right now (the bottleneck is having and documenting the benchmarks, Hydra is good-enough as a tool to visualize them atm).

thufschmitt avatar Feb 14 '24 12:02 thufschmitt

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-02-12-nix-team-meeting-minutes-123/39775/1

nixos-discourse avatar Feb 14 '24 13:02 nixos-discourse

@epompeii sorry for not answering earlier. Bencher looks great, and might be something we want to look at in the future. We probably wouldn't even need a dedicated adapter as the benchmarks would probably run under hyperfine or a generic c++ benchmark framework). It's not our priority though right now (the bottleneck is having and documenting the benchmarks, Hydra is good-enough as a tool to visualize them atm).

Thank you for the kind words! When it does become a priority, please feel free to reach out. I would be more than happy to help with the integration/custom adapter. 😃

epompeii avatar Feb 14 '24 13:02 epompeii