hash icon indicating copy to clipboard operation
hash copied to clipboard

H-5331: Support CPU and Wall time profiling in benchmarks

Open TimDiekmann opened this issue 4 months ago β€’ 2 comments

🌟 What is the purpose of this PR?

This implements CPU time profiling (using pprof) and Wall time profiling (using tracing spans). It can be hooked into the profiling by enabling them in the telemetry config.

πŸ” What does this change?

  • Implement Wall time profiling by introducing a layer. This creates a new layer instead of utilizing tracing-flame for three reasons:
    • We want to profile the idle time as well. This is important to also see the actual timings for asynchronous calls such as DB calls, which is one of the main reasons why we want this in the first place. Also, this aligns with the behavior of tracing-opentelemetry, which means that the resulting flame graph will be similar to the traces
    • tracing-flame only supports a io::Write interface but we want to collect them asynchronous in a dedicated thread using channels. While we could use write and flush to send messages, utilizing channels is easier, in particular because it allows us to use dedicated errors
    • tracing-flame creates a folded row for each enter/exit of a span resulting in a huge amount of data. It is enough to only generate a single row for flame graphs (we would need that for flame charts, but for that functionality, we have proper tracing)
  • Implement CPU time profiling by using Pyroscopes' pprof implementation

Both implementation can be separately disabled.

Pre-Merge Checklist πŸš€

🚒 Has this modified a publishable library?

This PR:

  • [x] does not modify any publishable blocks or libraries, or modifications do not need publishing

πŸ“œ Does this require a change to the docs?

The changes in this PR:

  • [x] are internal and do not require a docs change

πŸ•ΈοΈ Does this require a change to the Turbo Graph?

The changes in this PR:

  • [x] do not affect the execution graph

⚠️ Known issues

This is currently disabled in production because:

  • Wall time profiling results in timeouts in the app (probably related to the std::thread approach over tokio::task)
  • CPU profiling does not start. Also, proper CPU profiling in the graph would need tags per endpoint which we don't have, yet

TimDiekmann avatar Sep 10 '25 11:09 TimDiekmann

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 54.70%. Comparing base (df279d6) to head (00ff4c7). :warning: Report is 348 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7789      +/-   ##
==========================================
- Coverage   54.71%   54.70%   -0.01%     
==========================================
  Files        1085     1085              
  Lines       96195    96207      +12     
  Branches     4547     4553       +6     
==========================================
  Hits        52632    52632              
- Misses      42976    42988      +12     
  Partials      587      587              
Flag Coverage Ξ”
apps.hash-ai-worker-ts 1.32% <ΓΈ> (ΓΈ)
apps.hash-api 0.00% <ΓΈ> (ΓΈ)
local.harpc-client 50.93% <ΓΈ> (ΓΈ)
local.hash-backend-utils 3.69% <ΓΈ> (ΓΈ)
local.hash-graph-sdk 0.00% <ΓΈ> (ΓΈ)
local.hash-isomorphic-utils 0.00% <ΓΈ> (ΓΈ)
rust.antsi 0.00% <ΓΈ> (ΓΈ)
rust.error-stack 88.77% <ΓΈ> (ΓΈ)
rust.harpc-codec 84.22% <ΓΈ> (ΓΈ)
rust.harpc-net 96.10% <ΓΈ> (ΓΈ)
rust.harpc-tower 66.80% <ΓΈ> (ΓΈ)
rust.harpc-types 0.00% <ΓΈ> (ΓΈ)
rust.harpc-wire-protocol 92.23% <ΓΈ> (ΓΈ)
rust.hash-codec 72.52% <ΓΈ> (ΓΈ)
rust.hash-graph-api 3.17% <ΓΈ> (ΓΈ)
rust.hash-graph-postgres-store 20.06% <ΓΈ> (ΓΈ)
rust.hash-graph-store 32.93% <ΓΈ> (ΓΈ)
rust.hash-graph-temporal-versioning 48.22% <ΓΈ> (ΓΈ)
rust.hash-graph-validation 83.29% <ΓΈ> (ΓΈ)
rust.hashql-ast 86.45% <ΓΈ> (ΓΈ)
rust.hashql-core 82.26% <ΓΈ> (ΓΈ)
rust.hashql-diagnostics 50.24% <ΓΈ> (ΓΈ)
rust.hashql-eval 71.85% <ΓΈ> (ΓΈ)
rust.hashql-hir 86.25% <ΓΈ> (ΓΈ)
rust.hashql-syntax-jexpr 94.20% <ΓΈ> (ΓΈ)
rust.sarif 97.93% <ΓΈ> (ΓΈ)

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Sep 10 '25 11:09 codecov[bot]

Benchmarks don't finish, there is something going on which is not reproducible locally. Converting to draft.

TimDiekmann avatar Sep 10 '25 14:09 TimDiekmann

How might the CodSpeed PR (https://github.com/hashintel/hash/pull/8204) interact with this open PR? @TimDiekmann @indietyp

vilkinsons avatar Dec 21 '25 10:12 vilkinsons