noria
noria copied to clipboard
Improve system profiling tooling
The system currently outputs very little information that is useful for profiling. Information such as:
- Time spent in different parts of domain processing.
- Rate of backfills and record processing.
- Time between domain wakeups.
- Number of packets received/processed.
- Number of domain timeouts handled.
This would be hugely helpful for nailing down performance problems (in addition to #90).
Also:
- Read retries + time-to-completion
- Domain frequency + processing time per destination node
I have not really done this kind of thing before but I would be happy to give it a shot if someone can help point me in the right direction. I should also note that while I know a bit of Rust, I have not done a ton of actual work in it.
@jbcden Thanks for the offer! Thinking some more about this, I suspect this will actually require some relatively large-scale system refactoring to allow capturing all the metrics we care about. In particular, it's not immediately obvious to me how we store and report these metrics in a meaningful way and without overhead. I'm going to remove the good-first-issue tag for the time being.