noria Improve system profiling tooling

Improve system profiling tooling

Open jonhoo opened this issue 5 years ago • 3 comments

The system currently outputs very little information that is useful for profiling. Information such as:

Time spent in different parts of domain processing.
Rate of backfills and record processing.
Time between domain wakeups.
Number of packets received/processed.
Number of domain timeouts handled.

This would be hugely helpful for nailing down performance problems (in addition to #90).

Sep 19 '18 16:09 jonhoo

Also:

Read retries + time-to-completion
Domain frequency + processing time per destination node

Sep 19 '18 20:09 jonhoo

I have not really done this kind of thing before but I would be happy to give it a shot if someone can help point me in the right direction. I should also note that while I know a bit of Rust, I have not done a ton of actual work in it.

Oct 10 '18 11:10 jbcden

@jbcden Thanks for the offer! Thinking some more about this, I suspect this will actually require some relatively large-scale system refactoring to allow capturing all the metrics we care about. In particular, it's not immediately obvious to me how we store and report these metrics in a meaningful way and without overhead. I'm going to remove the good-first-issue tag for the time being.

Oct 12 '18 15:10 jonhoo

noria noria copied to clipboard

Improve system profiling tooling

noria
noria copied to clipboard