noria icon indicating copy to clipboard operation
noria copied to clipboard

Improve system profiling tooling

Open jonhoo opened this issue 5 years ago • 3 comments

The system currently outputs very little information that is useful for profiling. Information such as:

  • Time spent in different parts of domain processing.
  • Rate of backfills and record processing.
  • Time between domain wakeups.
  • Number of packets received/processed.
  • Number of domain timeouts handled.

This would be hugely helpful for nailing down performance problems (in addition to #90).

jonhoo avatar Sep 19 '18 16:09 jonhoo

Also:

  • Read retries + time-to-completion
  • Domain frequency + processing time per destination node

jonhoo avatar Sep 19 '18 20:09 jonhoo

I have not really done this kind of thing before but I would be happy to give it a shot if someone can help point me in the right direction. I should also note that while I know a bit of Rust, I have not done a ton of actual work in it.

jbcden avatar Oct 10 '18 11:10 jbcden

@jbcden Thanks for the offer! Thinking some more about this, I suspect this will actually require some relatively large-scale system refactoring to allow capturing all the metrics we care about. In particular, it's not immediately obvious to me how we store and report these metrics in a meaningful way and without overhead. I'm going to remove the good-first-issue tag for the time being.

jonhoo avatar Oct 12 '18 15:10 jonhoo