Spring Boot Actuator - Prometheus & Grafana 可视化监控

Open TFdream opened this issue 4 years ago • 0 comments

Spring 监控机制

在学习如何监控 Java 应用之前，我们需要先了解下 SpringBoot 的监控机制。在 Spring 2.x 之前，SpringBoot 使用 Actuator 模块进行监控，而在 Spring 2.x 之后，SpringBoot 使用了 Micrometer 进行监控。

Spring Boot Actuator 模块提供了生产级别的功能，比如健康检查，审计，指标收集，HTTP 跟踪等，帮助我们监控和管理 Spring Boot 应用。这个模块是一个采集应用内部信息暴露给外部的模块，上述的功能都可以通过 HTTP 和 JMX 访问。

在 Spring 2.x 之后，Actuator 使用 Micrometer 与这些外部应用程序监视系统集成。这样一来，只需很少的配置即可轻松集成外部的监控系统。

那什么是 Micrometer 呢？

Micrometer 为 Java 平台上的性能数据收集提供了一个通用的 API，应用程序只需要使用 Micrometer 的通用 API 来收集性能指标即可。Micrometer 会负责完成与不同监控系统的适配工作。这就使得切换监控系统变得很容易。

简单地说，actuator 是真正去采集数据的模块，而 Micrometer 更像是一个适配器，将 actuator 采集到的数据适配给各种监控工具。

Spring Actuator 快速入门

pom.xml 文件增加 Actuator依赖：

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <!-- actuator -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-actuator</artifactId>
        </dependency>

启动项目后，访问 localhost:8080/actuator/health

{
    "status":"UP",
    "components":{
        "diskSpace":{
            "status":"UP",
            "details":{
                "total":250790436864,
                "free":114159812608,
                "threshold":10485760,
                "exists":true
            }
        },
        "ping":{
            "status":"UP"
        }
    }
}

除了提供最基本的健康检查外，actuator 还提供了许多其他的端点（Endpoints）信息。通过这些端点信息，我们可以掌握 99% 的应用状态信息。

端点暴露配置

不同于 Actuator 1.x，Actuator 2.x 的大多数端点默认被禁掉。所以在查看对应端点之前，我们需要做好配置，否则我们是无法访问对应端点的。

我们可以通过以下配置，来配置通过 JMX 和 HTTP 暴露的端点。

属性	默认值
management.endpoints.jmx.exposure.exclude
management.endpoints.jmx.exposure.include	*
management.endpoints.web.exposure.exclude
management.endpoints.web.exposure.include	info, health

我们可以选择打开所有的监控点，例如：

management.endpoints.web.exposure.include=*

也可以选择打开部分端点，例如下面的配置排除 beans 和 trace 两个端点。

management.endpoints.web.exposure.exclude=beans,trace

Actuator 默认所有的监控点路径都在 /actuator/*，当然如果有需要这个路径也支持定制。

例如下面的配置将前缀改成了 monitor，那么访问路径就变成了 /monitor/*。

management.endpoints.web.base-path=/minitor

这里我们在 application.yml 中加入如下配置，默认打开所有端点。

management:
  endpoints:
    web:
      exposure:
        include: '*'

接着我们访问地址：localhost:8080/actuator/metrics，可以看到所有的指标地址。

{
    "names":[
        "jvm.buffer.count",
        "jvm.buffer.memory.used",
        "jvm.buffer.total.capacity",
        "jvm.classes.loaded",
        "jvm.classes.unloaded",
        "jvm.gc.live.data.size",
        "jvm.gc.max.data.size",
        "jvm.gc.memory.allocated",
        "jvm.gc.memory.promoted",
        "jvm.gc.pause",
        "jvm.memory.committed",
        "jvm.memory.max",
        "jvm.memory.used",
        "jvm.threads.daemon",
        "jvm.threads.live",
        "jvm.threads.peak",
        "jvm.threads.states",
        "logback.events",
        "process.cpu.usage",
        "process.files.max",
        "process.files.open",
        "process.start.time",
        "process.uptime",
        "system.cpu.count",
        "system.cpu.usage",
        "system.load.average.1m",
        "tomcat.sessions.active.current",
        "tomcat.sessions.active.max",
        "tomcat.sessions.alive.max",
        "tomcat.sessions.created",
        "tomcat.sessions.expired",
        "tomcat.sessions.rejected"
    ]
}

如果我们要查看 process.cpu.usage 指标，那么我们只需要访问 localhost:8080/actuator/metrics/process.cpu.usage，就可以看到具体的信息。

{
    "name":"process.cpu.usage",
    "description":"The \"recent cpu usage\" for the Java Virtual Machine process",
    "baseUnit":null,
    "measurements":[
        {
            "statistic":"VALUE",
            "value":0
        }
    ],
    "availableTags":[

    ]
}

常用端点介绍

Spring Boot Actuator 提供了 Endpoints（端点）给外部来与应用程序进行访问和交互。

例如 /health 端点提供了关于应用健康情况的一些基础信息。/metrics 端点提供了一些有用的应用程序指标（JVM 内存使用、系统 CPU 使用等）。

一般来说，端点可以分为几类：

应用配置类：获取应用程序中加载的应用配置、环境变量、自动化配置报告等与 Spring Boot 应用密切相关的配置类信息。
度量指标类：获取应用程序运行过程中用于监控的度量指标，比如：内存信息、线程池信息、HTTP 请求统计等。
操作控制类：提供了对应用的关闭等操作类功能。

详细的原生端点介绍，请以官网为准，这里就不赘述徒增篇幅。

health端点

/health 端点会聚合你程序的健康指标，来检查程序的健康情况。端点公开的应用健康信息取决于参数 management.endpoint.health.show-details，该属性值可选项为：never、always

我们也可以通过配置禁用某个组件的健康监测。例如下面的配置禁用了 mongodb 的组件健康监测。

management.health.mongo.enabled: false

或者我们可以禁用所有自动配置的健康指示器：

management.health.defaults.enabled: false

除了使用自动引入的健康指示器之外，我们也可以自定义一个 Health Indicator，只需要实现 HealthIndicator 接口或者继承 AbstractHealthIndicator 类。

例如下面我们创建了一个 ThreadHealthIndicator 类，继承了 AbstractHealthIndicator 类，并返回了线程池相关信息。

import org.springframework.boot.actuate.health.AbstractHealthIndicator;
import org.springframework.boot.actuate.health.Health;
import org.springframework.stereotype.Component;

import javax.annotation.Resource;
import java.util.concurrent.ThreadPoolExecutor;

/**
 * @author Ricky Fung
 */
@Component
public class ThreadHealthIndicator extends AbstractHealthIndicator {
    @Resource
    private ThreadPoolExecutor threadPoolExecutor;
    
    @Override
    protected void doHealthCheck(Health.Builder builder) throws Exception {
        // 使用 builder 来创建健康状态信息
        // 如果你throw 了一个 exception，那么status 就会被置为DOWN，异常信息会被记录下来
        builder.up()
                .withDetail("tp.coreSize", threadPoolExecutor.getCorePoolSize())
                .withDetail("tp.maxSize", threadPoolExecutor.getMaximumPoolSize())
                .withDetail("tp.poolSize", threadPoolExecutor.getPoolSize())
                .withDetail("tp.largestPoolSize", threadPoolExecutor.getLargestPoolSize())
                .withDetail("tp.activeCount", threadPoolExecutor.getActiveCount())
                .withDetail("tp.completedTaskCount", threadPoolExecutor.getCompletedTaskCount())
                .withDetail("tp.taskCount", threadPoolExecutor.getTaskCount());
    }
}

我们重启应用并访问地址：localhost:8080/actuator/health，我们可以看到自定义的健康信息。

{
    "status":"UP",
    "components":{
        "diskSpace":{
            "status":"UP",
            "details":{
                "total":250790436864,
                "free":114128850944,
                "threshold":10485760,
                "exists":true
            }
        },
        "ping":{
            "status":"UP"
        },
        "thread":{
            "status":"UP",
            "details":{
                "tp.coreSize":2,
                "tp.maxSize":5,
                "tp.poolSize":0,
                "tp.largestPoolSize":0,
                "tp.activeCount":0,
                "tp.completedTaskCount":0,
                "tp.taskCount":0
            }
        }
    }
}

其他端点包括 metrics、loggers、info、beans、heapdump、threaddump、shutdown等端点，可自行移步去官网上了解。

使用 Prometheus + Grafana 实现监控

和上面的项目的区别是多了一个 micrometer-registry-prometheus 包。

        <dependency>
            <groupId>io.micrometer</groupId>
            <artifactId>micrometer-registry-prometheus</artifactId>
            <version>1.5.14</version>
        </dependency>

项目打开后，在 application.properties 中加入如下配置，打开相关的端口。

management.endpoint.metrics.enabled=true
management.endpoints.web.exposure.include=*
management.endpoint.prometheus.enabled=true
management.metrics.export.prometheus.enabled=true

接着启动项目，访问 localhost:8080/actuator/prometheus 可以看到 SpringBoot 的应用信息都以 Prometheus 的标准形式输出了。

2.1 Prometheus 安装

Prometheus 的安装可以参考这篇文章：https://mp.weixin.qq.com/s/mbe6Q6L8IZ9x0gEq7HX5lg

本文使用使用 Docker 安装 Prometheus 更简单，运行下面的命令即可：

$ sudo docker run -d -p 9090:9090 prom/prometheus

一般情况下，我们还会指定配置文件的位置：

$ sudo docker run -d --name=prometheus -p 9090:9090 \  
    -v ~/docker/prometheus/:/etc/prometheus/ \  
    prom/prometheus

我们把配置文件放在本地 ~/docker/prometheus/prometheus.yml，这样可以方便编辑和查看，通过 -v 参数将本地的配置文件挂载到 /etc/prometheus/ 位置，这是 prometheus 在容器中默认加载的配置文件位置。

如果我们不确定默认的配置文件在哪，可以先执行上面的不带 -v 参数的命令，然后通过 docker inspect 命名看看容器在运行时默认的参数有哪些（下面的 Args 参数）：

$ sudo docker inspect 0c

2.2 配置 Prometheus

正如上面两节看到的，Prometheus 有一个配置文件，通过参数 --config.file 来指定，配置文件格式为 YAML。我们可以打开默认的配置文件 prometheus.yml 看下里面的内容：

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'demo-application'
    # 采集地址
    metrics_path: '/actuator/prometheus'
    # 目标服务器
    static_configs:
    - targets: ['127.0.0.1:8080']

Prometheus 默认的配置文件分为四大块：

global 块：Prometheus 的全局配置，比如 scrape_interval 表示 Prometheus 多久抓取一次数据，evaluation_interval 表示多久检测一次告警规则；
alerting 块：关于 Alertmanager 的配置，这个我们后面再看；
rule_files 块：告警规则，这个我们后面再看；
scrape_config 块：这里定义了 Prometheus 要抓取的目标，我们可以看到默认已经配置了一个名称为 prometheus 的 job，这是因为 Prometheus 在启动的时候也会通过 HTTP 接口暴露自身的指标数据，这就相当于 Prometheus 自己监控自己，虽然这在真正使用 Prometheus 时没啥用处，但是我们可以通过这个例子来学习如何使用 Prometheus；可以访问 http://localhost:9090/metrics 查看 Prometheus 暴露了哪些指标；

Prometheus启动成功后，我们使用浏览器，访问 http://127.0.0.1:9090/targets 地址，可以看到 Prometheus Job 抓取的所有目标。如下图所示：

2.4 Grafana安装配置

虽然 Prometheus 提供的 Web UI 也可以很好的查看不同指标的视图，但是这个功能非常简单，只适合用来调试。要实现一个强大的监控系统，还需要一个能定制展示不同指标的面板，能支持不同类型的展现方式（曲线图、饼状图、热点图、TopN 等），这就是仪表盘（Dashboard）功能。

因此 Prometheus 开发了一套仪表盘系统 PromDash，不过很快这套系统就被废弃了，官方开始推荐使用 Grafana 来对 Prometheus 的指标数据进行可视化，这不仅是因为 Grafana 的功能非常强大，而且它和 Prometheus 可以完美的无缝融合。

Grafana 是一个用于可视化大型测量数据的开源系统，它的功能非常强大，界面也非常漂亮，使用它可以创建自定义的控制面板，你可以在面板中配置要显示的数据和显示方式，它支持很多不同的数据源，比如：Graphite、InfluxDB、OpenTSDB、Elasticsearch、Prometheus 等，而且它也支持众多的插件。

下面我们就体验下使用 Grafana 来展示 Prometheus 的指标数据。首先我们来安装 Grafana，我们使用最简单的 Docker 安装方式：

$ docker run -d --name=grafana -p 3000:3000 grafana/grafana

运行上面的 docker 命令，Grafana 就安装好了！你也可以采用其他的安装方式，参考官方的安装文档。安装完成之后，我们访问 http://localhost:3000/ 进入 Grafana 的登陆页面，输入默认的用户名和密码（admin/admin）即可。

下面我们使用 Grafana官网 - Dashboards 模块中的「JVM（Micrometer）」图表模板来展示应用的各项指标。点击 JVM (Micrometer) dashboard for Grafana | Grafana Labs 可以获取到 dashboard 的 ID 为：4701。

接着我们在 Grafana 页面点击「Import」菜单进入导入设置页面。

Spring Boot Actuator - Prometheus & Grafana 可视化监控

Spring 监控机制

Spring Actuator 快速入门

端点暴露配置

常用端点介绍

health端点

使用 Prometheus + Grafana 实现监控

2.1 Prometheus 安装

2.2 配置 Prometheus

2.4 Grafana安装配置

相关资料