demo-slo-prometheus-grafana icon indicating copy to clipboard operation
demo-slo-prometheus-grafana copied to clipboard

SLIs and SLOs with Prometheus and Grafana for your APIs managed by Tyk

SLIs and SLOs with Prometheus and Grafana for your APIs managed by Tyk

About

This is a demo project running on Docker, that shows how to configure Tyk Gateway, Tyk Pump, Prometheus and Grafana OSS to set-up a dashboard with SLIs and SLOs for your APIs managed by Tyk.

You can use it to explore the Prometheus metrics exposed by Tyk Pump and use them in a Grafana dashboard.

SLOs-for-APIs-managed-by-Tyk-Dashboards-Grafana

Deploy and run the demo

  1. Clone this repository:
git clone https://github.com/TykTechnologies/demo-slo-prometheus-grafana.git
  1. Start the services
cd ./demo-slo-prometheus-grafana/
docker compose up -d
  1. Verify that all services are running
  1. Generate traffic

K6 is used to generate traffic to the API endpoints. The load script load.js will run for 15 minutes.

 docker compose run  k6 run /scripts/load.js

You will see K6 output in your terminal:

K6
  1. Check out the dashboard in Grafana

Go to Grafana in your browser (initial user/pwd: admin/admin) and open the dashboard called SLOs for APIs managed by Tyk.

You should see the data coming in: tyk_grafana_initial

You can also filter the data per API:

tyk_grafana_select_api

Tear down

Stop the services

docker compose stop

Remove the services

docker compose down

How this works

slo_grafana

Configuration

  • Tyk API Gateway is configured to expose two API endpoint:
    • httpbin (see .json config)
    • httpstatus (see .json config)
  • K6 will use the load script load.js to generate demo traffic to the API endpoints
  • Tyk Pump is configured to expose a metric endpoint for Prometheus (see config) with two custom metrics called tyk_http_requests_total and tyk_http_latency. Tyk Pump version >= 1.6. is needed for custom metrics.
  • Prometheus
    • prometheus.yml is configured to automatically scrape Tyk Pump's metric endpoint
    • slos.rules.yml is used to calculate additional metrics needed for the remaining error budget
  • Grafana
    • prometheus_ds.yml is configured to connect Grafana automatically to Prometheus
    • SLOs-for-APIs-managed-by-Tyk.json is the dashboard definition

SLIs and SLOs

Definition and example inspired from https://sre.google/workbook/slo-document/, https://landing.google.com/sre/workbook/chapters/alerting-on-slos/ and https://github.com/google/prometheus-slo-burn-example/blob/master/prometheus/slos.rules.yml.

You will see different indicators displayed on the Grafana dashboard.

To calculate the SLO and the displayed error budget remaining, we use the following SLI/SLO:

  • SLI: the proportion of successful HTTP requests, as measured from Tyk API Gateway
    • Any HTTP status other than 500–599 is considered successful.
    • count of http_requests which do not have a 5XX status code divided by count of all http_requests
  • SLO: 95% successful requests

In slos.rules.yml we calculate the rate of error per requests for the last 10 minute in job:slo_errors_per_request:ratio_rate10m. With job:error_budget:remaining we calculate the error budget remaining in percent. This is what we display in the Grafana dashboard. We use a threshold of 95% in the dashboard (every value below 95% is red).

Contribute

You are welcome to contribute by

Support, questions & feedback

This is a demo project, using Tyk Gateway and Tyk Pump currently using release candidate (RC) versions of Tyk Gateway and Tyk Pump.

For question about our products, please use Tyk Community forum.
Clients can also use [email protected].
Potential clients and evaluators, please use [email protected].