prometheus-exporter-plugin-for-opensearch
prometheus-exporter-plugin-for-opensearch copied to clipboard
[Tutorial] Write complete tutorial on how to setup OpenSearch with the plugin in K8s and Prometheus craping it
There is a lack of complete tutorial about how to setup OpenSearch cluster with the plugin in K8s and have Prometheus craping the metric endpoint.
See: https://forum.opensearch.org/t/prometheus-not-able-to-scrape-metrics-on-pod/16908/
Idea: This setup flow should be part of plugin new release process or even the CI (?)
Is there any progress in this task. I would like to use prometheus to scrape opensearch metrics and use Grafana dashboards to monitor
This tutorial is very much needed, I've been though several attempts to get Prometheus to scrape an endpoint on Kubernetes with no success
Just for the record the following is a Slack thread we had with @smbambling on this topic: https://opensearch.slack.com/archives/C051JEH8MNU/p1715262647976709
I've attempted to configure a scrape endpoint for Proemtheus to OpenSearch _prometheus/metrics via two seperate methods.
Notes:
- kube-prometheus-stack is used to deploy Prometheus, Grafana, etc
- OpenSearch Helm chart is used to deploy OpenSearch
- Additonal security configs ( ie internal user, bindings, index managemanet, etc. ) / index management is performed via a customer OpenSearch-Helper helm chart
Method 1: Static Prometheus configs
In this method I've modified the kube-prometheus-stack Helm value override in order to apply additional configs.
In the below values I've tested multiple different combintations of configs
- only
insecure_skip_verify: trueno other tls_configs set insecure_skip_verify: falsewithca_filesetmax_version: TLS12both set and not setcert_file+key_fileboth set and not set
prometheus:
prometheusSpec:
additionalScrapeConfigs:
- job_name: opensearch-job
metrics_path: /_prometheus/metrics
scheme: https
static_configs:
- targets:
- opensearch-localk3s-cl1-master.opensearch.svc.cluster.local:9200
basic_auth:
username: "admin"
password: "myfakePW"
tls_config:
insecure_skip_verify: true
max_version: TLS12
ca_file: /etc/prometheus/secrets/my-internal-wildcard-my-tls-certs/ca.crt
cert_file: /etc/prometheus/secrets/my-internal-wildcard-my-tls-certs/tls.crt
key_file: /etc/prometheus/secrets/my-internal-wildcard-my-tls-certs/tls.key
From another pod within the monitoring namespace where Prometheus ( no curl installed in the Prom container ) is running. I'm able to curl the internal service DNS name set above.
--- with referencing the CA cert
$ curl -XGET --cacert /tmp/foo -u 'admin:myfakePW' 'https://opensearch-localk3s-cl1-master.opensearch.svc.cluster.local:9200/_prometheus/metrics' | head
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0# HELP opensearch_jvm_mem_pool_max_bytes Maximum usage of memory pool
# TYPE opensearch_jvm_mem_pool_max_bytes gauge
opensearch_jvm_mem_pool_max_bytes{cluster="opensearch-localk3s-cl1",node="opensearch-localk3s-cl1-master-2",nodeid="7eGuaMZwTcKZYLfPDnovDA",pool="survivor",} 0.0
AND
--- without referencing the CA cert
$ curl -k -u 'admin:tes+1Passw*rd2' 'https://opensearch-localk3s-cl1-master.opensearch.svc.cluster.local:9200/_prometheus/metrics' | head
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0# HELP opensearch_indices_get_count Count of get commands
# TYPE opensearch_indices_get_count gauge
opensearch_indices_get_count{cluster="opensearch-localk3s-cl1",node="opensearch-localk3s-cl1-master-2",nodeid="7eGuaMZwTcKZYLfPDnovDA",} 0.0
opensearch_indices_get_count{cluster="opensearch-localk3s-cl1",node="opensearch-localk3s-cl1-hot-data-0",nodeid="-Modhwt_TMiOd4f4rSSPhg",} 48.0
I've attempted to configure a scrape endpoint for Proemtheus to OpenSearch _prometheus/metrics via two seperate methods.
Notes:
- kube-prometheus-stack is used to deploy Prometheus, Grafana, etc
- OpenSearch Helm chart is used to deploy OpenSearch
- Additonal security configs ( ie internal user, bindings, index managemanet, etc. ) / index management is performed via a customer OpenSearch-Helper helm chart
Method 2: Using Prometheus Service Monitor
In this method I've created a servicemonitor for kube-prometheus-stack to read and generate scrape targets.
Below is the output for my created servicemonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
annotations:
meta.helm.sh/release-name: opensearch-master
meta.helm.sh/release-namespace: opensearch
creationTimestamp: "2024-05-08T14:51:02Z"
generation: 12
labels:
app.kubernetes.io/component: opensearch-localk3s-cl1-master
app.kubernetes.io/instance: opensearch-master
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opensearch
app.kubernetes.io/version: 2.11.1
helm.sh/chart: opensearch-2.17.0
release: kube-prometheus-stack
name: opensearch-service-monitor
namespace: monitoring
resourceVersion: "141672"
uid: cf1df5d5-a855-4eb1-8cb5-da2ddaad99f6
spec:
endpoints:
- basicAuth:
password:
key: password
name: opensearch-service-monitor-basic-auth
username:
key: username
name: opensearch-service-monitor-basic-auth
interval: 10s
path: /_prometheus/metrics
port: http
scheme: https
tlsConfig:
ca: {}
insecureSkipVerify: true
namespaceSelector:
matchNames:
- opensearch
selector:
matchLabels:
app.kubernetes.io/component: opensearch-localk3s-cl1-master
app.kubernetes.io/instance: opensearch-master
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opensearch
app.kubernetes.io/version: 2.11.1
helm.sh/chart: opensearch-2.17.0
Again multiple different combintations of configs were tested within the servicemonitor which proivded the same end result. Where the scrape endpoints are created but there is an SSL handshake issue for Prometheus
Just as verification I could also curl from the same pod in method 1 to the cluster IP endpoints generated via the servicemonitor
$ curl -u 'admin:myfakePW' -k https://10.42.0.69:9200/_prometheus/metrics | head
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0# HELP opensearch_indices_refresh_total_time_seconds Time spent while refreshes
# TYPE opensearch_indices_refresh_total_time_seconds gauge
opensearch_indices_refresh_total_time_seconds{cluster="opensearch-localk3s-cl1",node="opensearch-localk3s-cl1-master-2",nodeid="7eGuaMZwTcKZYLfPDnovDA",} 0.0
opensearch_indices_refresh_total_time_seconds{cluster="opensearch-localk3s-cl1",node="opensearch-localk3s-cl1-hot-data-0",nodeid="-Modhwt_TMiOd4f4rSSPhg",} 174.781
In the end both methods produce the following errors in the Prometheus UI
Thanks @smbambling for putting the effort into write it all down.
In our testing setup we had limiting ciphers in plugins.security.ssl.transport.enabled_ciphers, commenting this out allowed Prometheus to scrape the endpoints and gather data.
i want to ask something, does this meas the opensearch provide the metrics data to prome? or prome provide the metrics data to opensearch?
@rarifz This installs an exporter that exposes metrics about OpenSearch that Prometheus can be configured to scrape
hello @smbambling, have you found a workaround? I tried with curl , it worked. But prometheus can not scrape metrics from this path /_prometheus/metrics
FYI, other people use prometheus can scrape if setup cluster only using http protocol.
Hello @smbambling, do we have any workaround for people using HTTPS with basic auth enabled? We see that it's working with curl, but Prometheus cannot scrape metrics from the /_prometheus/metrics path & it shows down.