VictoriaMetrics
VictoriaMetrics copied to clipboard
vmagent: too many connections to APISERVER on AKS
Describe the bug
I have average <1500 endpoints on my prod environment to scrape as below screenshot.
When I am using netstat command inside vmagent pod to know how many connection "ESTABLISHED" to scrape I am getting right value except endpoint kubeapiserver. In the below output I excluded apiserver endpoint. ` ~ $ netstat -atn | grep ESTABLISHED | grep -v 172.18.0.1:443 | wc -l
1491 `
Now If I am going to know "ESTABLISHED" with KubeAPIserver then I am getting below output. ` ~ $ netstat -atn | grep ESTABLISHED | grep 172.18.0.1:443 | wc -l
1150 `
So, even I tried with "promscrape.disableKeepAlive: false" I am getting the same. Now my question is why vmagent is establishing those very high number of connections with kubeapiserver. I don't think there is any sense.
My cluster was built with publicKubeApiserver config and we are getting prod issue SNAT port exhaustion on AKS. That's product application is not able to server traffic to client as those ports were consumed by vmagent while connecting public KubeAPIserver.
Even I compared with other service like prometheus they are not creating that number of connections with KubeAPIserver.
To Reproduce
I am using with below config for vmagent to enable monitoring of the podmonitor or vmscapeservice. I am using VMOperator to deploy with below config.
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAgent
metadata:
name: vmagent-1
namespace: monitoring-system
spec:
podScrapeNamespaceSelector: {}
podScrapeSelector: {}
serviceScrapeNamespaceSelector: {}
serviceScrapeSelector: {}
extraArgs:
remoteWrite.forcePromProto: "true"
envflag.enable: "true"
envflag.prefix: "VM_"
sortLabels: "true"
promscrape.noStaleMarkers: "true"
promscrape.maxScrapeSize: 200MB
promscrape.kubernetesSDCheckInterval: 30s
promscrape.cluster.replicationFactor: "2"
containers:
- name: config-reloader
image: prometheus-operator/prometheus-config-reloader:v0.71.2
resources:
limits:
cpu: "1"
memory: "2Gi"
securityContext:
fsGroup: 65534
runAsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
image:
repository: victoriametrics/vmagent
tag: "v1.93.12"
Version
v1.93.12
Logs
There is no error on vmagent pods. I can't paste logs info logs in public domain due to its having pods and NS info.
Screenshots
No response
Used command-line flags
No response
Additional information
No response
Hello @prasadrajesh!
vmagent supposed to expose a metric vm_promscrape_discovery_kubernetes_group_watchers
which shows the current number of watchers requesting updates from k8s api. Each watcher can have up to 100 idle connections established. Could you please show results of the sum(vm_promscrape_discovery_kubernetes_group_watchers)
query on the same time interval as your screenshot?
Output of Query "vm_promscrape_discovery_kubernetes_group_watchers"
Output of Query "sum(vm_promscrape_discovery_kubernetes_group_watchers)"
I think each watcher is creating only connection. But why we do have that high number of watcher required. Can't we stay with one connection instead of creating very high number of connection? the count of apiserver connection is almost 40% from all scraping endpoint connections.
Can't we stay with one connection instead of creating very high number of connection? the count of apiserver connection is almost 40% from all scraping endpoint connections.
The watcher is created for each specific ServiceDiscovery config. It is basically defined by combination of:
- API server address
- list of namespaces
- list of selectors and also proxy_url and settings like attach_metadata. You can see when new watcher is created here https://github.com/VictoriaMetrics/VictoriaMetrics/blob/a2ea8bc97b717fdcf48b9126015c9d6922f90a3d/lib/promscrape/discovery/kubernetes/api_watcher.go#L287-L307
So I'd suggest verifying how many SD configs you have per vmagent to understand how many unique watchers it could have created. And to check if there is a bug in vmagent code - could you verify how max(vm_promscrape_discovery_kubernetes_group_watchers)
behaved on the long time range of couple of days?
Thanks @hagen1778 Thanks for quick reply.
Isn't it possible that multiplexing multiple watches over a single HTTP/2 connection? So, even though I will have 1000 watcher I will have only one connection instead of going though one goroutine for each watcher?
It is not a very good approach taking into account that every SD config should be updated independently. So we should expect that if you have 1k SD configs all of them could be updated concurrently, to deliver changes as fast as possible.
Btw, I've checked metrics for couple of production metrics from different vmagent installations - none of them has 1k watches. It is usually below 20. Only one case has 170 watchers. Do you use some standard SD configs?
Ans of your question: I have only one SD config that is "kubernetes_sd_configs" and target scraped by defined at "vmagent.env.yaml".
Suggestion: My point was "to enable multiplexing multiple watches over a single HTTP/2 connection" like "prometheus-config-reloader" is doing.
Workaround Actually to update SD configs I am using "prometheus-config-reloader" and I was hoping the container "prometheus-config-reloader" (which is running inside same pod where VMAGENT is running) should take responsibility to update "vmagent.env.yaml" file instead of vmagent itself. But still vmagent is taking responsibility to watch APIs.
So, @hagen1778 How can I disable API watch from VMAGENT and tell to VMAGENT that just follow "vmagent.env.yaml" (which was created by "prometheus-config-reloader" and keep updated by "prometheus-config-reloader")?
Actually to update SD configs I am using "prometheus-config-reloader" and I was hoping the container "prometheus-config-reloader" (which is running inside same pod where VMAGENT is running) should take responsibility to update "vmagent.env.yaml" file instead of vmagent itself. But still vmagent is taking responsibility to watch APIs.
I think prometheus-config-reloader just updates configs for SD, not actually doing the SD. So option for disabling the discovery won't work.
@Haleygo do you have any opinion on this? It looks related to https://github.com/VictoriaMetrics/operator/pull/267
Hello @prasadrajesh ,
How can I disable API watch from VMAGENT and tell to VMAGENT that just follow "vmagent.env.yaml"
Like @hagen1778 mentioned, vmagent does just following vmagent.env.yaml
which updated by config-reloader, but it needs to watch apiserver to dynamically discover real targets like prometheus does.
For example, if you create a VMServiceScrape
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: vmalertmanager-t-victoria-metrics-k8s-stack
namespace: test
spec:
endpoints:
- attach_metadata: {}
path: /metrics
port: http
namespaceSelector: {}
selector:
matchExpressions:
- key: operator.victoriametrics.com/additional-service
operator: DoesNotExist
matchLabels:
app.kubernetes.io/component: monitoring
app.kubernetes.io/instance: t-victoria-metrics-k8s-stack
app.kubernetes.io/name: vmalertmanager
managed-by: vm-operator
vm-operator will add a scrape job to vmagent's config file which mounted using secret by default
- job_name: serviceScrape/test/vmalertmanager-t-victoria-metrics-k8s-stack/0
honor_labels: false
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- test
metrics_path: /metrics
relabel_configs:
- action: keep
source_labels:
- __meta_kubernetes_service_label_app_kubernetes_io_component
regex: monitoring
- action: keep
source_labels:
- __meta_kubernetes_service_label_app_kubernetes_io_instance
regex: t-victoria-metrics-k8s-stack
- action: keep
source_labels:
- __meta_kubernetes_service_label_app_kubernetes_io_name
regex: vmalertmanager
- target_label: endpoint
replacement: http
Then config-reloader will detect this file change and reload the vmagent process, vmagent will create a new watcher with url like https://{{apiserver-address}}/api/v1/namespaces/test/endpointswatch=1&allowWatchBookmarks=true&timeoutSeconds=xx
and find all the endpoints as scrape targets.
So there will be at least one watch connection for each scrape job. If the job role is endpoints
or endpointslice
, there will be two more connections to watch related pod and service resources.
Could you share how many scrape jobs in your scrape config, and how about vm_promscrape_discovery_kubernetes_url_watchers
, does it goes down when you restart the vmagent?
@Haleygo do you have any opinion on this? It looks related to https://github.com/VictoriaMetrics/operator/pull/267
@hagen1778 No, it's not related. since https://github.com/VictoriaMetrics/operator/pull/267 is about reduce pressure for apiserver in operator when generating scrape configs.
Thanks for the explanation @Haleygo! Do you think there is room for improvement as @prasadrajesh mentioned?
I wrote a service where I am converting HTTP1.1 requests to HTTP2 (so, by default 100 connections converted to 1 connection). I put that service between VMAgent and ApiServer. It fixed my issue.
Is there any update on this? We are currently running into the same issue (SNAT Port exhaustion= on our public API AKS Cluster. For now we added a NAT Gateway to mitigate that, but thats quiet expensive.