hydroflow icon indicating copy to clipboard operation
hydroflow copied to clipboard

feat(dfir_rs): track and expose DFIR runtime metrics

Open MingweiSamuel opened this issue 2 months ago • 3 comments

Tokio runtime metrics: https://docs.rs/tokio/latest/tokio/runtime/struct.RuntimeMetrics.html

Per-subgraph metrics:

  • total_run_count - how many times the subgraph has been run
  • total_poll_duration - amount of time the subgraph is running
  • total_poll_count - how many times the subgraph is polled
  • total_idle_duration - amount of time the subgraph is "idle" (not running and not complete)
  • total_idle_count - how many times the subgraph is idle

Per-handoff metrics:

  • total_items_count- Total items inserted into this handoff.
  • ? current_items_count - Number of items currently in the handoff

MingweiSamuel avatar Oct 29 '25 16:10 MingweiSamuel

My comments from tracking issue #2178

Looking into how best to instrument execution of subgraphs. tokio-metrics provides comprehensive general instrumentation of async tasks, but testing using it in DFIR shows 15% longer runtime on some microbenchmarks. tokio-metrics records more properties than we care about [right now], and uses atomics and Arc to support multiple threads while we could track things directly in SubgraphData. Based on this I think it is best to implement custom instrumentation of subgraphs in DFIR with tokio-metrics as inspiration.

MingweiSamuel avatar Oct 29 '25 23:10 MingweiSamuel

I think we can start with a few important metrics per subgraph execution:

  • total_poll_duration - amount of time the subgraph is running
  • total_poll_count - how many times the subgraph is polled
  • total_idle_duration - amount of time the subgraph is "idle" (not running and not complete)
  • total_idle_count - how many times the subgraph is idle

MingweiSamuel avatar Oct 29 '25 23:10 MingweiSamuel

Some Dfir-runtime level metrics would be external events received

MingweiSamuel avatar Nov 08 '25 00:11 MingweiSamuel