nostream icon indicating copy to clipboard operation
nostream copied to clipboard

[REQUEST] log and visualize key performance parameters

Open Giszmo opened this issue 2 years ago • 3 comments

Pledge

I pledge to pay $100 if the following gets implemented in its entirety:

Is your feature request related to a problem? Please describe.

I'm tweaking parameters, updating the relay, observing genuine demand fluctuation and probably DOS attacks and only have CPU load, Disk I/O, Bandwidth and some other parameters that are logged for the whole machine to judge if anything is wrong. That's not good for taking informed decisions.

Describe the solution you'd like

  • [ ] Log key performance indicators
  • [ ] Expose them through a nice interface (serverUrl/stats for example) maybe using grafana or similar

KPIs I would love to see:

  • [ ] concurrent websockets open
  • [ ] concurrent queries watched
  • [ ] websockets opened/closed
  • [ ] time from connect to first EOSE
  • [ ] time from [e:[<singleEventId>] query to EOSE
  • [ ] events served
  • [ ] Standard system load parameters: CPU, Load, Memory, Disk I/O, Disk Usage, Bandwidth

For some of these parameters, aggregate functions like median, 95th percentile ... would be of interest.

Giszmo avatar Jan 18 '23 14:01 Giszmo

~~I'll also add $100 (of sats) to this pledge.~~ (see below) For an MVP I would be satisfied just with basic metrics for relayed nostr events.

@Giszmo Most of the stuff you're asking for has little or nothing to do with nostream itself and already handled in other ways, or sounds relatively complex to implement.

  • concurrent websockets open - that's a proxy concern, not nostream
  • concurrent queries watched - medium complexity (minding connection timeouts)
  • websockets opened/closed - I'd guess that's the downstream proxy's concern (e.g nginx), not nostream
  • time from connect to first EOSE - connections are handled by nginx so this is
  • time from [e:[] query to EOSE - high complexity (summary metric)
  • events served - agreed
  • Standard system load parameters: CPU, Load, Memory, Disk I/O, Disk Usage, Bandwidth - just use prometheus + node_exporter or a comparable stack

Creating a grafana dashboard is easy enough, I can do that side of it. Just need a metrics endpoint. I've done those too, but not in TS.

bleetube avatar Feb 01 '23 16:02 bleetube

@Giszmo @bleetube I've shared your request with the Grafana staff to raise awareness on building a metrics collector. No promises.

Meanwhile, if you just need a metrics endpoint, you can forward whatever Prometheus metrics you'd like to long term storage in Grafana Cloud's managed Mimir service using the Influx proxy:

https://grafana.com/docs/grafana-cloud/data-configuration/metrics/metrics-influxdb/push-from-telegraf/

jmarbach avatar Feb 08 '23 15:02 jmarbach

Actually I realized I can do the MVP part I described without touching nodejs. I can just write some python to talk directly to postgres. I have started working on it this weekend in a separate repo:

https://github.com/bleetube/nostream_exporter (work in progress)

To start it is exporting one metric, the total count of events in the events table. I have select queries to add in for these metrics as well:

  • top events by kind
  • top talker users by pubkey all time
  • top talker users by pubkey recently
  • count of paid users

I'll implement those and add in more as time permits. I'll also put together a grafana dashboard to chart them out. And if I think of any good metrics to alert on using alertmanager, I'll add those to the repo as well. Might be nice to send myself an alert if a user is spamming the relay, for instance.

bleetube avatar Feb 18 '23 04:02 bleetube