unleash Add Grafana dashboard

Describe the feature request

We want to monitor Unleash on-prem installation with Prometheus. And we want to observe the collected metrics via Grafana dashboard.

Background

No response

Solution suggestions

Would be awesome to have a ready-to-use Grafana dashboard. In this case, would be much easier to observe Unleash from the box.

Nov 27 '22 19:11 zamazan4ik

I assume a ready-to-use dashboard is something to be used with dashboard management - import/export. Seems like a common setup to have, and a template definitely can help. For Unleash hosted we have a bit different needs, supporting many clients in multiple regions, so I don't think it's helpful as a starter. Maybe we can share some queries and chart definitions. CC @chriswk

This could work as a blog post or snippet, because I don't feel like we're able to support it long term in the repository.

Nov 28 '22 14:11 Tymek

Hi @zamazan4ik I'm trying to understand the use case: what kind of metrics are you interested in?

We do expose some application level metrics (api docs) which could be connected with Prometheus and drawn in Grafana. Are these the kind of metrics you wanted? or are you interested in other types of metrics (more operational, such as CPU and memory)?

Dec 26 '22 08:12 gastonfournier

@gastonfournier I am interested in both of them. Application-level metrics are interesting for more business-aligned stakeholders, I suppose. (just a note - the description for "application-level metrics could be improved, I guess. it's not clear, what is described by each metric).

Operational metrics are interesting for the Unleash maintainers (admins, devops, etc.). Memory/CPU usage of the whole process (or a bunch of processes/microservices - I am not familiar with the whole Unleash stack yet) would be useful for them. If you know more metrics that are useful for Unleash admin - would be awesome to put them too to the dashboard.

Dec 26 '22 08:12 zamazan4ik

We do build our dashboards for our operations based on metrics exposed by one endpoint. I just double-checked because I did not remember if it was open-sourced, and yes it is: https://docs.getunleash.io/reference/api/legacy/unleash/internal/prometheus

What we do is have a Prometheus instance scraping this endpoint, and we build Grafana dashboards based on that information. Recently, I've set this up to test some new metrics and this is the configuration I've used:

global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
      monitor: 'example'

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets: ['localhost:9093']

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s
    scrape_timeout: 5s

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:9090']

  - job_name: node
    # If prometheus-node-exporter is installed, grab stats about the local
    # machine by default.
    static_configs:
      - targets: ['localhost:9100']

  - job_name: local_unleash
    metrics_path: /internal-backstage/prometheus
    static_configs:
      - targets: ['localhost:4242']

Of course, this is for local testing and how you configure it might vary depending on your environment.

Now, on the type of dashboards you can have, it might be interesting to have a repository with community-maintained Grafana dashboards. We do have most of ours built around multitenancy, so we'd have to clean those up of client-id variables and other infra (such as our API GW of choice or our cloud provider metrics) for them to be useful for on-prem installations.

I went over our operational dashboards and they need some work but a good starting point could be a list of things to monitor, I can start with the first draft (feel free to suggest other metrics):

Total requests per second per url (line chart showing the change over time)
Error requests per second per url (line chart showing the change over time)
Process CPU usage (line chart showing the change over time)
Process memory usage (line chart showing the change over time)
Database connection pool (line chart showing the change over time)
Eventloop lag (99th percentile line chart showing the change over time)

Others that would require Prometheus node exporter:

Instance CPU utilization (line chart showing the change over time)
Instance load (line chart showing the change over time)
Instance memory usage (line chart showing the change over time)

Let me know if this helps. I'll bring this to the team when we get together next year.

Dec 26 '22 09:12 gastonfournier

Let me know if this helps. I'll bring this to the team when we get together next year.

Yes, this helps a lot!

I went over our operational dashboards and they need some work but a good starting point could be a list of things to monitor, I can start with the first draft (feel free to suggest other metrics)

That would be awesome. It's much easier for the users just download Grafana dashboards and import them to the local setup.

Thanks in advance!

Dec 26 '22 10:12 zamazan4ik

Maybe a simple bare bone example of most important metrics could be added to https://grafana.com/grafana/dashboards/

Feb 07 '23 14:02 gastonfournier

Is this open for grabs?

Oct 31 '23 13:10 rakshitgondwal

Hi @rakshitgondwal, we haven't prioritized this yet, so any contribution will be welcomed

Nov 03 '23 08:11 gastonfournier

hello @gastonfournier , I found your suggestion on the Grafana dashboard and the types of metrics that could be scraped by Prometheus both at Application level, and Operational (Infrastructure) level, insightful. But I do think the type of data to be visualized using Grafana depends heavily on the organization. Although, we have the commonly scraped metrics such as, CPU/Memory and Storage , at operational level. A lot of processes can be monitored and depends on the organizations needs. I would suggest that @zamazan4ik carries out a survey within the organization to determine the kind of data that suits the business needs.

Jan 21 '24 17:01 ogunleye0720