prometheus-adapter
prometheus-adapter copied to clipboard
Kubectl --raw reporting an unknown metric even though it shows up in the list of known metrics
What happened?: Getting HPA reporting status unknown for a given metric, other metrics seem to work fine:
"error_rate_metric" on Ingress/my-ingress (target value): <unknown> / 1
...
Warning FailedGetObjectMetric 83s (x95 over 25m) horizontal-pod-autoscaler unable to get metric error_rate_metric: Ingress on my-namespace my-ingress/unable to fetch metrics from custom metrics API: the server could not find the metric error_rate_metric for ingresses.networking.k8s.io my-ingress
What did you expect to happen?: Custom metric reports back with at least 1 given the query being used
Please provide the prometheus-adapter config:
The config for this metric is fairly simple, and in theory should always return SOME value via the clamp_min:
- metricsQuery: clamp_min(round(sum(rate(<<.Series>>{<<.LabelMatchers>>,status=~"^5.."}[1m])) or vector(0.00001) / sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m]) ), 0.01) * 100, 1)
resources:
template: <<.Resource>>
name:
as: error_rate_metric
seriesFilters: []
seriesQuery: '{__name__="nginx_ingress_controller_requests",ingress="my-ingress",namespace!=""}'
Please provide the HPA resource used for autoscaling:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
maxReplicas: 1
metrics:
- object:
describedObject:
apiVersion: networking.k8s.io/v1
kind: Ingress
name: my-ingress
metric:
name: nginx_ingress_controller_requests_rate_my_ingress_ingress
target:
averageValue: "75"
type: AverageValue
value: "0"
type: Object
- object:
describedObject:
apiVersion: networking.k8s.io/v1
kind: Ingress
name: my-ingress
metric:
name: nginx_ingress_controller_response_duration_p95_my_ingress_ingress
target:
type: Value
value: "7"
type: Object
- object:
describedObject:
apiVersion: networking.k8s.io/v1
kind: Ingress
name: my-ingress
metric:
name: error_rate_metric
target:
type: Value
value: "1"
type: Object
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-deployment
Please provide the HPA status:
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetObjectMetric the HPA was unable to compute the replica count: unable to get metric error_rate_metric: Ingress on my-namespace my-ingress/unable to fetch metrics from custom metrics API: the server could not find the metric error_rate_metric for ingresses.networking.k8s.io my-ingress
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Please provide the prometheus-adapter logs with -v=6 around the time the issue happened:
Verbose logging in the adapter shows the following when trying to request the data from the HPA:
I0215 22:22:39.231327 1 httplog.go:132] "HTTP" verb="GET" URI="/apis/custom.metrics.k8s.io/v1beta1/namespaces/my-namespace/ingresses.networking.k8s.io/my-ingress/error_rate_metric" latency="1.923735ms" userAgent="kube-controller-manager/v1.25.11 (linux/arm64) kubernetes/8cfcba0/system:serviceaccount:kube-system:horizontal-pod-autoscaler" audit-ID="a-b-c-d-e" srcIP="172.1.1.1:47017" resp=404
Other logs were present but not relevant the this error rate metric failing
Anything else we need to know?:
Querying prometheus for the what I expect it should be translating to shows data in response, to be clear though, the results have no labels:
# query:
clamp_min(
round(
sum(
rate(nginx_ingress_controller_requests{ingress="my-ingress",namespace!="",status=~"^5.."}[1m])
) or vector(0.00001)
/
sum(
rate(nginx_ingress_controller_requests{ingress="my-ingress",namespace!=""}[1m])
)
, 0.01) * 100,
1)
# result:
{} - 1
When querying via the RAW addresses in kubectl, I can see that this named metric does exist:
❯ kubectl --context=cluster-context get --raw '/apis/custom.metrics.k8s.io/v1beta1' | jq . | grep error_rate_metric
"name": "jobs.batch/error_rate_metric",
"name": "prometheuses.monitoring.coreos.com/error_rate_metric",
"name": "pods/error_rate_metric",
"name": "services/error_rate_metric",
"name": "ingresses.networking.k8s.io/error_rate_metric",
"name": "namespaces/error_rate_metric",
However when I attempt to query it I get a NotFound:
❯ kubectl --context=cluster-context get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/my-namespace/Ingress/my-ingress/error_rate_metric" | jq .
Error from server (NotFound): the server could not find the metric error_rate_metric for Ingress my-ingress
I expect this to at the very least show 1.
It seems like this issue may be related to this, however the fixes in there do not seem to have helped: https://github.com/kubernetes-sigs/prometheus-adapter/issues/150
Environment:
- prometheus-adapter version:
prometheus-adapter-4.3.0 - prometheus version:
0.31.0 - Kubernetes version (use
kubectl version):
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.9", GitCommit:"a1a87a0a2bcd605820920c6b0e618a8ab7d117d4", GitTreeState:"clean", BuildDate:"2023-04-12T12:16:51Z", GoVersion:"go1.19.8", Compiler:"gc", Platform:"darwin/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.11", GitCommit:"8cfcba0b15c343a8dc48567a74c29ec4844e0b9e", GitTreeState:"clean", BuildDate:"2023-06-14T09:49:38Z", GoVersion:"go1.19.10", Compiler:"gc", Platform:"linux/arm64"}
- Cloud provider or hardware configuration: EKS
/assign @dgrisonnet /triage accepted