supertokens-core
supertokens-core copied to clipboard
Add service metrics for supertokens-core
🚀 Feature
As a developer or SRE supporting production workloads, I want service metrics from SuperTokens core, so that I can distinguish baseline / normal behavior from anomalous / problematic behavior, and make correlations with other metrics while troubleshooting problems.
Implementation details
Prometheus is the most popular metrics engine right now, so providing a Prometheus endpoint would make this feature accessible for the widest audience. Prometheus is a pull-model-- the service gathers the metrics internally and the server pulls them from a well known HTTP path on a regular cadence. See: https://github.com/prometheus/client_java#http
The usual suspects for metrics are throughput, latency, and errors. I'm not clear yet what the most significant resources/operations are for SuperTokens, but offhand that might be:
- New sessions/second
- Session data lookups/second
- Session logouts(wipes?)/second
- Response time for each of the above
- Expressed as P50, P95 and P99 percentiles
- Errors/second (of all types, or broken down by operation or error type e.g. DB connect failures)
Other interesting ones might be anything to do with load-shedding (how many times/second are we sending a 429 Too Many Requests
(presumably with a Retry-After
header)), or backpressure from some other system load-shedding (did the DB transaction latency go up? that would probably correlate directly with our own latency going up).