Add monitoring service
At one point, MaMpf already made use of Prometheus but somehow that got into oblivion. Metrics can be very helpful in determining how fast our server responds and where there are bottlenecks. Prometheus seems a good choice, let's evaluate if the currently included (but not used) gem prometheus_exporter by discourse is still the one we want to go with.
❓ Some questions I'd like to get answered
- Number of logins, signups, signouts, user deletions
- What is the current/average RAM memory consumption (across different Puma workers, but also aggregated)
- What is the current/average CPU load (across different Puma workers, but also aggregated)
- What is the current latency.
- What disk space size do user submissions / other type of media demand?
- How many users do we have? How many logged in during the last 30 days? (keep in mind "Remember me" cookies)
- Maybe a tiny bit of anonymous user tracking: what are the most visited MaMpf sites
(and more fine-grained: most played videos etc.) - Show sidekiq stats, i.e. when are jobs scheduled, are there errors?
🚨 Alerts
Prometheus allows to send alerts (e.g. via mail) according to rules that we can define. E.g.
- Send an alert when the CPU usage stays above 90% for more than 10 minutes.
- Send an alert when we're running out of disk space (e.g. 80% of disk space used)
- Send an alert when the app becomes too sluggish, e.g. average latency of last 5 minutes being over 5s etc.
Access to the metrics
Of course, these metrics are not meant for public. I'm not yet sure where to show them. In all cases, there should be a nice responsible dashboard that only shows what we're asking for and that is customizable.
Some options for where this dashboard could be shown:
- Have a
/metricsendpoint as a password-protected zone. However, what if MaMpf is down, then we cannot even access our own metrics... - So maybe a better idea is to send the metrics internally to another server that is available via another URL and serves as an aggregator.
In all cases, I think it'd be beneficial if the steps to see the metrics are minimal, otherwise we won't often look at them. That means that I'd like to see them without being connected to Cisco, i.e. make them available to the WWW, but of course password-protected (just as an idea).
Error tracking
This might also be a good opportunity to show errors in a unified metrics dashboard (maybe in addition to sending mails, just to keep them as a last resort backup option).
Dependencies
It would be nice if a metrics dashboard could also show all versions of packages and where there are new ones available. Especially where there are new major versions available since our update commands only update the patch and minor versions of a gem. But this might also be better achievable in GitHub itself via Dependabot etc.
For a dashboard view over the data, we might want to use Grafana.
Also see the prometheus_exporter gem and the linked blog post that can serve as a guide on how to set things up.