prometheus-adapter
prometheus-adapter copied to clipboard
Correct Configuration Fails to Provide Expected Custom Metrics in EKS
What happened?: Correct Configuration Fails to Provide Expected Custom Metrics in EKS We have deployed identical Prometheus chart and Prometheus-Adapter chart in both Alibaba Cloud ACK cluster and AWS EKS cluster. The configurations of Prometheus and Prometheus-Adapter are the same in both K8S clusters. The scraping configuration for Prometheus is as follows:
job_name: basicai-business-queue-wait
metrics_path: /metrics/prometheus
scheme: http
scrape_interval: 30s
honor_labels: true
kubernetes_sd_configs:
- role: service
namespaces:
names:
- basicai-backend
- basicai-stage-backend
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_component]
regex: dataset
action: keep
- source_labels: [__meta_kubernetes_namespace]
target_label: 'kubernetes_namespace'
action: replace
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_component]
target_label: 'kubernetes_deployment'
action: replace
- source_labels: [__meta_kubernetes_service_port_number]
regex: 80
action: keep
The values.yaml for Prometheus-Adapter chart is as follows:
image:
repository: registry.talos.basic.ai/common/images/prometheus-adapter
tag: "v0.11.2"
pullPolicy: IfNotPresent
prometheus:
url: http://prometheus-server
port: 80
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 100m
memory: 1Gi
rules:
default: false
custom:
- seriesQuery: '{__name__=~"basicai_job_replica_scale_percent",container!="POD",kubernetes_namespace!="",type="dataset-upload"}'
resources:
template: <<.Resource>>
overrides:
kubernetes_namespace: {resource: "namespace"}
kubernetes_deployment: {resource: "deployment"}
name:
matches: "basicai_job_replica_scale_percent"
as: "upload_job_replica_scale_percent_dataset"
metricsQuery: last_over_time(basicai_job_replica_scale_percent{<<.LabelMatchers>>,type="dataset-upload"}[5m])
In the Alibaba Cloud ACK cluster, the Prometheus-Adapter correctly provides custom metrics:
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "custom.metrics.k8s.io/v1beta1",
"resources": [
{
"name": "deployments.apps/upload_job_replica_scale_percent_dataset",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
{
"name": "namespaces/upload_job_replica_scale_percent_dataset",
"singularName": "",
"namespaced": false,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
{
"name": "jobs.batch/upload_job_replica_scale_percent_dataset",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
}
]
}
However, in the EKS cluster, the Prometheus-Adapter provides a large number of default metrics, but does not include the expected 'upload_job_replica_scale_percent_dataset':
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq | head -n 50
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "custom.metrics.k8s.io/v1beta1",
"resources": [
{
"name": "services/authentication_duration_seconds_sum",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
.....
.....
.....
What did you expect to happen?: prometheus-adapter provides correct custom metrics in AWS EKS cluster as in Alibaba Cloud ACK cluster
Please provide the prometheus-adapter config:
image:
repository: registry.talos.basic.ai/common/images/prometheus-adapter
tag: "v0.11.2"
pullPolicy: IfNotPresent
prometheus:
url: http://prometheus-server
port: 80
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 100m
memory: 1Gi
rules:
default: false
custom:
- seriesQuery: '{__name__=~"basicai_job_replica_scale_percent",container!="POD",kubernetes_namespace!="",type="dataset-upload"}'
resources:
template: <<.Resource>>
overrides:
kubernetes_namespace: {resource: "namespace"}
kubernetes_deployment: {resource: "deployment"}
name:
matches: "basicai_job_replica_scale_percent"
as: "upload_job_replica_scale_percent_dataset"
metricsQuery: last_over_time(basicai_job_replica_scale_percent{<<.LabelMatchers>>,type="dataset-upload"}[5m])