feat(cli): add metrics server to iroh doctor
Description
Folks try to look at metrics while doctor alone is running (note this clashes by default with the iroh node if both are running locally on port :9090). Also kind of inconvenient given if you want to look at a running iroh node when doctor is running.
Suggestions welcome, maybe default off metrics on doctor?
Also most metrics are not really used in the doctor path, but we can fix that as we figure out what's cool to measure.
Breaking Changes
Notes & open questions
Change checklist
- [x] Self-review.
- [ ] Documentation updates if relevant.
- [ ] Tests if relevant.
- [ ] All breaking changes documented.
Maybe I'm doing something wrong but it seems to me this conflicts with plot. If a node is running with metrics, then I get a tui dash but I get an error as well, and if no iroh node is running I get no error but no data (which I guess is expected)
Conversely, if I start doctor plot first I get an empty dash but now I cannot start a node with metrics for plot to track
I think my expectations are:
- If nothing is running (no iroh node, no doctor), and I run
doctor plot, I would not expect the metrics server to be started (because the plotting tool isn't doing any magic socket stuff) - If no iroh node is running, but I'm doing testing with
doctor acceptordoctor connect, I'd like to be able to usedoctor plotto monitor things - If an iroh node is running, and I run
doctor accept, I'm not 100% sure what I expect. Part of me thinks that maybe the doctor should use the running node for its operations, but for now let's assume there are good reasons to not do that. So I think I'd like the chance toplotboth the iroh node and the doctor.
So maybe something like:
doctor acceptanddoctor connectgain a new--expose-metricsoption, which is off-by-default- The
--expose-metricsoption can take an optional port number to listen on, instead of 9090 - The
doctor plotcommand gains an option to specify the port/URL to connect to - If I have both a node running and a doctor running, it's my responsibility to manage the ports correctly to avoid any collision
This should now work decently nice for the above use cases. Metrics are off by default and you can turn them on when you like by setting the metrics port. eg
shell A
iroh --metrics-port 9091 doctor relay-urls
shell B
iroh doctor plot --scrape-url http://localhost:9091 iroh_requests_total_total