nix icon indicating copy to clipboard operation
nix copied to clipboard

add stack sampling profiler for flamegraphs

Open Mic92 opened this issue 1 year ago • 7 comments

This is code is a bit rough in the sense that I have taken shortcuts to quickly iterate. This is not ready for merging yet, but I think it's already useful for people that want to generate flamegraphs for things as large as NixOS.

Usage:

$ NIX_PROFILE_FILE=/tmp/nixos-trace nix run github:mic92/nix-1/sampling-profiler eval -v --no-eval-cache github:mic92/dotfiles#nixosConfigurations.turingmachine.config.system.build.toplevel    

The result is in this example stored in /tmp/nixos-trace. It can be imported in tools that support folded stacks i.e. https://www.speedscope.app/ or the original flamegraph script (https://github.com/brendangregg/FlameGraph)

The profiler records stack trace of the nix evaluation every 10ms (100Hz).

The resulting file compresses well with zstd:

/tmp/nixos-trace     :  0.27%   (  2.15 GiB =>   5.95 MiB, /tmp/nixos-trace.zst) 

Motivation

Context

Priorities and Process

Add :+1: to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

Mic92 avatar Aug 26 '24 08:08 Mic92

This is a screen shot when evaluating my NixOS machine:

nixos-trace - speedscope

Mic92 avatar Aug 26 '24 08:08 Mic92

Here is an example trace: https://github.com/Mic92/nix-1/releases/download/assets/nixos-trace.zst

You can download and decompress it with zstd:

zstd -d /tmp/nixos-trace.zst

Than visit https://www.speedscope.app/ and import it.

Mic92 avatar Aug 26 '24 08:08 Mic92

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/flamegraphs-for-nixos/51183/1

nixos-discourse avatar Aug 26 '24 09:08 nixos-discourse

cc @picnoir @Atemu who were involved in the tracy based profiler: https://github.com/NixOS/nix/pull/9967

Mic92 avatar Aug 26 '24 09:08 Mic92

Thanks a bunch for this! I tried going through your flame graph and have some feedback:

  1. File + line isn't very useful info at a glance. To actually get an idea of what's going on, you'd need the name of the binding on that line in addition to the file+line number
  2. There's a bunch of «none»s in the output, what do those represent?

Atemu avatar Aug 26 '24 11:08 Atemu

Thanks a bunch for this! I tried going through your flame graph and have some feedback:

1. File + line isn't very useful info at a glance. To actually get an idea of what's going on, you'd need the name of the binding on that line in addition to the file+line number

The issue is that function in nix don't really have names. Often enough they are assigned to variables but not always i.e. if they are passed to other functions.

2. There's a bunch of `«none»`s in the output, what do those represent?

If a builtin calls a function than we don't have a position at the moment for example. I might be able to provide the string name of builtins, but I don't know if I can get the position in this case as well.

Mic92 avatar Aug 26 '24 11:08 Mic92

The issue is that function in nix don't really have names.

True, but ExprLambda has a name that's based on its context, as it is often part of a binding. It's imperfect information, but it works well.

  1. There's a bunch of «none»s in the output, what do those represent?

This may be the position of the call instead of the position of the function. ExprLambda has its own position which I believe is always available.

roberth avatar Aug 26 '24 12:08 roberth

Rebased + macOS fix. I haven't addressed any of the comments yet.

Mic92 avatar Sep 09 '24 07:09 Mic92

@Mic92 sorry to bump, but since it's been a while since there's been any activity on this, do you have any advice for people interested in pushing this feature forward? I personally would appreciate any recommendations you have for what should be done next.

ConnorBaker avatar May 15 '25 19:05 ConnorBaker

@ConnorBaker hopefully this https://github.com/NixOS/nix/pull/13219 can get the ball rolling once again. @Mic92 please check that the API I provided in that PR is enough to accomplish what is necessary here.

xokdvium avatar May 16 '25 22:05 xokdvium

Indeed. I needed something like that. Thanks for looking into it.

Mic92 avatar May 16 '25 23:05 Mic92

I've also taken the liberty to rebase this patch on top of the proposed EvalProfiler (as a technical POC) and added more information (primop name and lambda name) https://github.com/NixOS/nix/pull/13220. Feel free to cherry-pick changes from that branch.

xokdvium avatar May 17 '25 11:05 xokdvium

Lets go with your pull request in https://github.com/NixOS/nix/pull/13220

Mic92 avatar May 18 '25 19:05 Mic92