H-5331: Support CPU and Wall time profiling in benchmarks
π What is the purpose of this PR?
This implements CPU time profiling (using pprof) and Wall time profiling (using tracing spans). It can be hooked into the profiling by enabling them in the telemetry config.
π What does this change?
- Implement Wall time profiling by introducing a layer. This creates a new layer instead of utilizing
tracing-flamefor three reasons:- We want to profile the idle time as well. This is important to also see the actual timings for asynchronous calls such as DB calls, which is one of the main reasons why we want this in the first place. Also, this aligns with the behavior of
tracing-opentelemetry, which means that the resulting flame graph will be similar to the traces -
tracing-flameonly supports aio::Writeinterface but we want to collect them asynchronous in a dedicated thread using channels. While we could usewriteandflushto send messages, utilizing channels is easier, in particular because it allows us to use dedicated errors -
tracing-flamecreates a folded row for each enter/exit of a span resulting in a huge amount of data. It is enough to only generate a single row for flame graphs (we would need that for flame charts, but for that functionality, we have proper tracing)
- We want to profile the idle time as well. This is important to also see the actual timings for asynchronous calls such as DB calls, which is one of the main reasons why we want this in the first place. Also, this aligns with the behavior of
- Implement CPU time profiling by using Pyroscopes'
pprofimplementation
Both implementation can be separately disabled.
Pre-Merge Checklist π
π’ Has this modified a publishable library?
This PR:
- [x] does not modify any publishable blocks or libraries, or modifications do not need publishing
π Does this require a change to the docs?
The changes in this PR:
- [x] are internal and do not require a docs change
πΈοΈ Does this require a change to the Turbo Graph?
The changes in this PR:
- [x] do not affect the execution graph
β οΈ Known issues
This is currently disabled in production because:
- Wall time profiling results in timeouts in the app (probably related to the
std::threadapproach overtokio::task) - CPU profiling does not start. Also, proper CPU profiling in the graph would need tags per endpoint which we don't have, yet
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 54.70%. Comparing base (df279d6) to head (00ff4c7).
:warning: Report is 348 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #7789 +/- ##
==========================================
- Coverage 54.71% 54.70% -0.01%
==========================================
Files 1085 1085
Lines 96195 96207 +12
Branches 4547 4553 +6
==========================================
Hits 52632 52632
- Misses 42976 42988 +12
Partials 587 587
| Flag | Coverage Ξ | |
|---|---|---|
| apps.hash-ai-worker-ts | 1.32% <ΓΈ> (ΓΈ) |
|
| apps.hash-api | 0.00% <ΓΈ> (ΓΈ) |
|
| local.harpc-client | 50.93% <ΓΈ> (ΓΈ) |
|
| local.hash-backend-utils | 3.69% <ΓΈ> (ΓΈ) |
|
| local.hash-graph-sdk | 0.00% <ΓΈ> (ΓΈ) |
|
| local.hash-isomorphic-utils | 0.00% <ΓΈ> (ΓΈ) |
|
| rust.antsi | 0.00% <ΓΈ> (ΓΈ) |
|
| rust.error-stack | 88.77% <ΓΈ> (ΓΈ) |
|
| rust.harpc-codec | 84.22% <ΓΈ> (ΓΈ) |
|
| rust.harpc-net | 96.10% <ΓΈ> (ΓΈ) |
|
| rust.harpc-tower | 66.80% <ΓΈ> (ΓΈ) |
|
| rust.harpc-types | 0.00% <ΓΈ> (ΓΈ) |
|
| rust.harpc-wire-protocol | 92.23% <ΓΈ> (ΓΈ) |
|
| rust.hash-codec | 72.52% <ΓΈ> (ΓΈ) |
|
| rust.hash-graph-api | 3.17% <ΓΈ> (ΓΈ) |
|
| rust.hash-graph-postgres-store | 20.06% <ΓΈ> (ΓΈ) |
|
| rust.hash-graph-store | 32.93% <ΓΈ> (ΓΈ) |
|
| rust.hash-graph-temporal-versioning | 48.22% <ΓΈ> (ΓΈ) |
|
| rust.hash-graph-validation | 83.29% <ΓΈ> (ΓΈ) |
|
| rust.hashql-ast | 86.45% <ΓΈ> (ΓΈ) |
|
| rust.hashql-core | 82.26% <ΓΈ> (ΓΈ) |
|
| rust.hashql-diagnostics | 50.24% <ΓΈ> (ΓΈ) |
|
| rust.hashql-eval | 71.85% <ΓΈ> (ΓΈ) |
|
| rust.hashql-hir | 86.25% <ΓΈ> (ΓΈ) |
|
| rust.hashql-syntax-jexpr | 94.20% <ΓΈ> (ΓΈ) |
|
| rust.sarif | 97.93% <ΓΈ> (ΓΈ) |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
Benchmarks don't finish, there is something going on which is not reproducible locally. Converting to draft.
How might the CodSpeed PR (https://github.com/hashintel/hash/pull/8204) interact with this open PR? @TimDiekmann @indietyp