amazon-vpc-cni-k8s
amazon-vpc-cni-k8s copied to clipboard
Expose metrics for cni-metrics-helper for prometheus to scrape it
What would you like to be added: Kindly expose metrics for cni-metrics-helper
Why is this needed: Currently prometheus is unable to scrape metrics from cni-metrics-helper. hence cannot view cni related metrics in prometheus
@aravindhkudiyarasan if you are interested to contribute, you are welcome to create a PR.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days
Issue closed due to inactivity.
@haouc I would like to work on this, please assign this to me
@aravindhkudiyarasan @haouc I can see that ipamD metrics are already exposed at port 61678 at path /metrics for prometheus to scrape. I am not clear about what other metrics you need. Can you please help on clarifying this?
Is this supposed to be the prometheus metrics for the cni-metrics-helper /metrics endpoint itself ? like no of http requests,200 responses etc ?
@haouc any inputs on this ?
Its not just about ipmad. The ultimate solidarity we need is to export cni-metrics-helper pod to expose metrics directly so tat Prometheus can scrape it.
Currently it needs to push it to cloudwatch then we have to export it
On Fri, 23 Dec 2022 at 3:00 PM, balajisa @.***> wrote:
@haouc https://github.com/haouc any inputs on this ?
— Reply to this email directly, view it on GitHub https://github.com/aws/amazon-vpc-cni-k8s/issues/2071#issuecomment-1363781467, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHDCLBJCAXDAGZ3MY2IZKI3WOVWLPANCNFSM57WDEDNQ . You are receiving this because you were mentioned.Message ID: @.***>
got it. Thanks
@haouc can you review this. This is a draft code and i am looking for suggestions to improve it.
Not sure I follow the design here. Why are the metrics currently exposed at port 61678 at path /metrics not scrapeable by prometheus?
@jdn5126 it is scrap able. The 61678 port is exposed by the aws-vpc-cni application. There is a cni-metrics-helper app that scrapes the metrics from all the replicas of aws-vpc-cni app and aggregate those metrics ,do some post processing which is then being pushed to cloudwatch aws service. so, now you can see the overall vpc-cni app metrics only in cloudwatch.
we are try to expose those aggregated metrics to prometheus , instead of pushing it to cloudwatch. Atleast this is what i understand.
Got it, thank you for explaining. So the design is for cni-metrics-helper pod to expose all metrics that it already exports to CloudWatch at a port that prometheus can pull from. And provide configuration by which customers can publish to CloudWatch, expose for prometheus, or both
yeah. That's right.
@balajisa09 thank you, I will start reviewing this soon
@jdn5126 @balajisa09 So, finally how do you configure servicemonitor to scrape the metrics? The deployment of cni-metrics-helper chart does not have any service, the servicemonitor cannot scrape it. Below is my sm config on prometheus which did not work.
- name: "vpc-cni-metrics"
selector:
matchLabels:
k8s-app: cni-metrics-helper
namespaceSelector:
any: true
endpoints:
- port: "61678"
path: '/metrics'
I noticed that one can use the PodMonitor to scrape the Daemonset, aws-nodes. I tried and did not get metrics. (https://grafana.com/grafana/dashboards/16032-aws-cni-metrics/)
I am not sure which one I should use.
https://grafana.com/grafana/dashboards/10970-k8s-cni-metrics/ This link mentioned that "Also, it requires annotations to be added to aws-node daemon set. " Not sure what annotation it needs.
@jdn5126 @balajisa09 Can we get this PR merged Soon ?
@jdn5126 @balajisa09 Can we get this PR merged Soon ?
Sure. Got busy with job search. Will get it merged soon.
@xyfleet as you mentioned, cni-metrics-helper
does not have a service, so ServiceMonitor does not work here.
As for the Grafana dashboards you linked, neither are maintained by AWS, so I am not sure if they work, ever worked, or what annotation they are referring to.
Currently, cni-metrics-helper
only supports streaming aggregated metrics to CloudWatch. This issue is to add an HTTP server that prometheus can pull metrics from.
The aws-node
pods already runs an HTTP server to expose per-node metrics for prometheus to pull from. I cannot find any CNI-specific guide online, but you can use PodMonitor and prometheus to get metrics.
@jdn5126 Thanks a lot. Will the new PR mentioned above allow us to use ServiceMonitor to scrape the metrics from cni-metrics-helper?
For the podMonitor scraping aws-node daemonset, is aws-node open on the port: 61678 and path: /metrics? or different.
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: aws-node
labels:
k8s-app: aws-node
spec:
selector:
matchLabels:
k8s-app: aws-node
podMetricsEndpoints:
- port: 61678
path: '/metrics'
@jdn5126 Thanks a lot. Will the new PR mentioned above allow us to use ServiceMonitor to scrape the metrics from cni-metrics-helper?
For the podMonitor scraping aws-node daemonset, is aws-node open on the port: 61678 and path: /metrics? or different.
apiVersion: monitoring.coreos.com/v1 kind: PodMonitor metadata: name: aws-node labels: k8s-app: aws-node spec: selector: matchLabels: k8s-app: aws-node podMetricsEndpoints: - port: 61678 path: '/metrics'
The new PR will not allow us to use ServiceMonitor. It will still require PodMonitor to be used for cni-metrics-helper
with prometheus (this will require testing by PR submitter).
For aws-node daemonset, the metrics are published on :61678/metrics
, yeah. That configuration looks correct to me, but I have not tested it.
Hello, many of us use the built-in kubernetes sd component of prometheus to do our scraping.
That feature uses pod annotations to tell prometheus whether or not to scrape, and what the details are for the particular target.
annotations:
prometheus.io/path: /metrics
prometheus.io/port: 61678
prometheus.io/scrape: true
This is far simpler for so many situations. Is there any way we can get this incorporated as an option for the plugin?
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days
Issue closed due to inactivity.
@balajisa09 thank you, I will start reviewing this soon
We can reuse USE_CLOUDWATCH
to either push to CW or setup prometheus..we can discuss offline on this.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days
/not stale
⚠️COMMENT VISIBILITY WARNING⚠️
Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.