supertokens-core icon indicating copy to clipboard operation
supertokens-core copied to clipboard

Add service metrics for supertokens-core

Open sayotte opened this issue 3 years ago • 0 comments

🚀 Feature

As a developer or SRE supporting production workloads, I want service metrics from SuperTokens core, so that I can distinguish baseline / normal behavior from anomalous / problematic behavior, and make correlations with other metrics while troubleshooting problems.

Implementation details

Prometheus is the most popular metrics engine right now, so providing a Prometheus endpoint would make this feature accessible for the widest audience. Prometheus is a pull-model-- the service gathers the metrics internally and the server pulls them from a well known HTTP path on a regular cadence. See: https://github.com/prometheus/client_java#http

The usual suspects for metrics are throughput, latency, and errors. I'm not clear yet what the most significant resources/operations are for SuperTokens, but offhand that might be:

  • New sessions/second
  • Session data lookups/second
  • Session logouts(wipes?)/second
  • Response time for each of the above
    • Expressed as P50, P95 and P99 percentiles
  • Errors/second (of all types, or broken down by operation or error type e.g. DB connect failures)

Other interesting ones might be anything to do with load-shedding (how many times/second are we sending a 429 Too Many Requests (presumably with a Retry-After header)), or backpressure from some other system load-shedding (did the DB transaction latency go up? that would probably correlate directly with our own latency going up).

sayotte avatar May 14 '21 16:05 sayotte