nostream
nostream copied to clipboard
[REQUEST] log and visualize key performance parameters
Pledge
I pledge to pay $100 if the following gets implemented in its entirety:
Is your feature request related to a problem? Please describe.
I'm tweaking parameters, updating the relay, observing genuine demand fluctuation and probably DOS attacks and only have CPU load, Disk I/O, Bandwidth and some other parameters that are logged for the whole machine to judge if anything is wrong. That's not good for taking informed decisions.
Describe the solution you'd like
- [ ] Log key performance indicators
- [ ] Expose them through a nice interface (serverUrl/stats for example) maybe using grafana or similar
KPIs I would love to see:
- [ ] concurrent websockets open
- [ ] concurrent queries watched
- [ ] websockets opened/closed
- [ ] time from connect to first EOSE
- [ ] time from [e:[<singleEventId>] query to EOSE
- [ ] events served
- [ ] Standard system load parameters: CPU, Load, Memory, Disk I/O, Disk Usage, Bandwidth
For some of these parameters, aggregate functions like median, 95th percentile ... would be of interest.
~~I'll also add $100 (of sats) to this pledge.~~ (see below) For an MVP I would be satisfied just with basic metrics for relayed nostr events.
@Giszmo Most of the stuff you're asking for has little or nothing to do with nostream itself and already handled in other ways, or sounds relatively complex to implement.
- concurrent websockets open - that's a proxy concern, not nostream
- concurrent queries watched - medium complexity (minding connection timeouts)
- websockets opened/closed - I'd guess that's the downstream proxy's concern (e.g nginx), not nostream
- time from connect to first EOSE - connections are handled by nginx so this is
- time from [e:[] query to EOSE - high complexity (summary metric)
- events served - agreed
- Standard system load parameters: CPU, Load, Memory, Disk I/O, Disk Usage, Bandwidth - just use prometheus + node_exporter or a comparable stack
Creating a grafana dashboard is easy enough, I can do that side of it. Just need a metrics endpoint. I've done those too, but not in TS.
@Giszmo @bleetube I've shared your request with the Grafana staff to raise awareness on building a metrics collector. No promises.
Meanwhile, if you just need a metrics endpoint, you can forward whatever Prometheus metrics you'd like to long term storage in Grafana Cloud's managed Mimir service using the Influx proxy:
https://grafana.com/docs/grafana-cloud/data-configuration/metrics/metrics-influxdb/push-from-telegraf/
Actually I realized I can do the MVP part I described without touching nodejs. I can just write some python to talk directly to postgres. I have started working on it this weekend in a separate repo:
https://github.com/bleetube/nostream_exporter (work in progress)
To start it is exporting one metric, the total count of events in the events table. I have select queries to add in for these metrics as well:
- top events by kind
- top talker users by pubkey all time
- top talker users by pubkey recently
- count of paid users
I'll implement those and add in more as time permits. I'll also put together a grafana dashboard to chart them out. And if I think of any good metrics to alert on using alertmanager, I'll add those to the repo as well. Might be nice to send myself an alert if a user is spamming the relay, for instance.