pyroscope icon indicating copy to clipboard operation
pyroscope copied to clipboard

pprof profile types

Open kolesnikovae opened this issue 2 years ago • 1 comments

Pyroscope supports only predefined profile types: cpu, inuse_objects, alloc_memory, etc. Based on the profile type, pyroscope determines the following parameters:

  • Whether the profile is cumulative or not. At the ingestion time, we take the difference of two consequent cumulative profiles and treat it as a delta profile.
  • Aggregation type. At the query time, the pyroscope server uses one of the aggregation functions:
    • average for instant profiles (e.g. Go heap inuse_space and inuse_objects).
    • sum for other profile types.
  • Units. This is only important for the presentation layer.
  • Display name (profile type name override).

However, pprof format assumes that a profile may contain multiple sample types, which we treat as profile types in pyroscope. For example, a Go heap profile contains four sample types (or profiles): inuse_space, inuse_objects, alloc_space, alloc_objects), and a NodeJS heap profile contains two types: objects, space.

Therefore we have pre-defined profile type parameters for every supported sample type: https://github.com/pyroscope-io/pyroscope/blob/ba568d06cf4dfe7504bb396a5a8f680a624c0492/pkg/storage/tree/pprof.go#L7-L37

The problem is that we will inevitably encounter collisions:

  • between pprof origins, like Go and NodeJS.
  • between pprof profiles. In instance, both Go mutex and block profiles have contentions sample type. Although they have identical parameters, we need to override profile type names in order to avoid mixing profiling data.

Therefore, for pprof data we need to resolve the profile parameters based on:

  • spy name (origin/runtime): Go, NodeJS, etc.
  • profile name (cpu, heap, goroutines, and so on).
  • sample type.

So that the keys in the map would be (the example is just for clarity, implementation details may change):

  • go.cpu.samples
  • go.block.contentions
  • nodejs.heap.space

To achieve this, we need to address two pretty minor issues:

  1. In pull mode, we don't have a spy name to differentiate between, say, Go pprof data and pprof-nodejs. I think we should stick to the first path and require the "spy name" to be provided in the scrape configuration. Alternatively, we can try to determine the source based on some heuristics (like file extensions).
  2. In push mode, we don’t have a profile name. We should extend the /ingest endpoint and require the profile type/name explicitly for pprof format.

kolesnikovae avatar Apr 01 '22 19:04 kolesnikovae

Just my 0.02$

In case we have a this mapping on a frontend we can enrich it with a more useful features, like gray out everything which isn't your code but, say, node internals or library code. It can be done by introducing additional rules for each spy and profile name pair.

shaleynikov avatar Apr 01 '22 19:04 shaleynikov