amazon-vpc-cni-k8s icon indicating copy to clipboard operation
amazon-vpc-cni-k8s copied to clipboard

Expose metrics for cni-metrics-helper for prometheus to scrape it

Open aravindhkudiyarasan opened this issue 2 years ago • 1 comments

What would you like to be added: Kindly expose metrics for cni-metrics-helper

Why is this needed: Currently prometheus is unable to scrape metrics from cni-metrics-helper. hence cannot view cni related metrics in prometheus

aravindhkudiyarasan avatar Aug 26 '22 09:08 aravindhkudiyarasan

@aravindhkudiyarasan if you are interested to contribute, you are welcome to create a PR.

haouc avatar Sep 27 '22 21:09 haouc

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions[bot] avatar Nov 27 '22 00:11 github-actions[bot]

Issue closed due to inactivity.

github-actions[bot] avatar Dec 11 '22 00:12 github-actions[bot]

@haouc I would like to work on this, please assign this to me

balajisa09 avatar Dec 17 '22 15:12 balajisa09

@aravindhkudiyarasan @haouc I can see that ipamD metrics are already exposed at port 61678 at path /metrics for prometheus to scrape. I am not clear about what other metrics you need. Can you please help on clarifying this?

Is this supposed to be the prometheus metrics for the cni-metrics-helper /metrics endpoint itself ? like no of http requests,200 responses etc ?

balajisa09 avatar Dec 19 '22 18:12 balajisa09

@haouc any inputs on this ?

balajisa09 avatar Dec 23 '22 09:12 balajisa09

Its not just about ipmad. The ultimate solidarity we need is to export cni-metrics-helper pod to expose metrics directly so tat Prometheus can scrape it.

Currently it needs to push it to cloudwatch then we have to export it

On Fri, 23 Dec 2022 at 3:00 PM, balajisa @.***> wrote:

@haouc https://github.com/haouc any inputs on this ?

— Reply to this email directly, view it on GitHub https://github.com/aws/amazon-vpc-cni-k8s/issues/2071#issuecomment-1363781467, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHDCLBJCAXDAGZ3MY2IZKI3WOVWLPANCNFSM57WDEDNQ . You are receiving this because you were mentioned.Message ID: @.***>

aravindhkudiyarasan avatar Dec 23 '22 09:12 aravindhkudiyarasan

got it. Thanks

balajisa09 avatar Dec 26 '22 09:12 balajisa09

@haouc can you review this. This is a draft code and i am looking for suggestions to improve it.

balajisa09 avatar Jan 02 '23 17:01 balajisa09

Not sure I follow the design here. Why are the metrics currently exposed at port 61678 at path /metrics not scrapeable by prometheus?

jdn5126 avatar Jan 04 '23 17:01 jdn5126

@jdn5126 it is scrap able. The 61678 port is exposed by the aws-vpc-cni application. There is a cni-metrics-helper app that scrapes the metrics from all the replicas of aws-vpc-cni app and aggregate those metrics ,do some post processing which is then being pushed to cloudwatch aws service. so, now you can see the overall vpc-cni app metrics only in cloudwatch.

we are try to expose those aggregated metrics to prometheus , instead of pushing it to cloudwatch. Atleast this is what i understand.

balajisa09 avatar Jan 04 '23 17:01 balajisa09

Got it, thank you for explaining. So the design is for cni-metrics-helper pod to expose all metrics that it already exports to CloudWatch at a port that prometheus can pull from. And provide configuration by which customers can publish to CloudWatch, expose for prometheus, or both

jdn5126 avatar Jan 04 '23 19:01 jdn5126

yeah. That's right.

balajisa09 avatar Jan 05 '23 04:01 balajisa09

@balajisa09 thank you, I will start reviewing this soon

jdn5126 avatar Jan 05 '23 15:01 jdn5126

@jdn5126 @balajisa09 So, finally how do you configure servicemonitor to scrape the metrics? The deployment of cni-metrics-helper chart does not have any service, the servicemonitor cannot scrape it. Below is my sm config on prometheus which did not work.


- name: "vpc-cni-metrics"
    selector:
      matchLabels:
        k8s-app: cni-metrics-helper
    namespaceSelector:
      any: true
    endpoints:
    - port: "61678"
      path: '/metrics'

I noticed that one can use the PodMonitor to scrape the Daemonset, aws-nodes. I tried and did not get metrics. (https://grafana.com/grafana/dashboards/16032-aws-cni-metrics/)

I am not sure which one I should use.

https://grafana.com/grafana/dashboards/10970-k8s-cni-metrics/ This link mentioned that "Also, it requires annotations to be added to aws-node daemon set. " Not sure what annotation it needs.

xyfleet avatar Jan 31 '23 21:01 xyfleet

@jdn5126 @balajisa09 Can we get this PR merged Soon ?

aravindhkudiyarasan avatar Feb 01 '23 09:02 aravindhkudiyarasan

@jdn5126 @balajisa09 Can we get this PR merged Soon ?

Sure. Got busy with job search. Will get it merged soon.

balajisa09 avatar Feb 01 '23 10:02 balajisa09

@xyfleet as you mentioned, cni-metrics-helper does not have a service, so ServiceMonitor does not work here.

As for the Grafana dashboards you linked, neither are maintained by AWS, so I am not sure if they work, ever worked, or what annotation they are referring to.

Currently, cni-metrics-helper only supports streaming aggregated metrics to CloudWatch. This issue is to add an HTTP server that prometheus can pull metrics from.

The aws-node pods already runs an HTTP server to expose per-node metrics for prometheus to pull from. I cannot find any CNI-specific guide online, but you can use PodMonitor and prometheus to get metrics.

jdn5126 avatar Feb 01 '23 17:02 jdn5126

@jdn5126 Thanks a lot. Will the new PR mentioned above allow us to use ServiceMonitor to scrape the metrics from cni-metrics-helper?

For the podMonitor scraping aws-node daemonset, is aws-node open on the port: 61678 and path: /metrics? or different.


apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: aws-node
  labels:
    k8s-app: aws-node
spec:
  selector:
    matchLabels:
      k8s-app: aws-node
  podMetricsEndpoints:
   - port: 61678
    path: '/metrics'


xyfleet avatar Feb 01 '23 19:02 xyfleet

@jdn5126 Thanks a lot. Will the new PR mentioned above allow us to use ServiceMonitor to scrape the metrics from cni-metrics-helper?

For the podMonitor scraping aws-node daemonset, is aws-node open on the port: 61678 and path: /metrics? or different.


apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: aws-node
  labels:
    k8s-app: aws-node
spec:
  selector:
    matchLabels:
      k8s-app: aws-node
  podMetricsEndpoints:
   - port: 61678
    path: '/metrics'

The new PR will not allow us to use ServiceMonitor. It will still require PodMonitor to be used for cni-metrics-helper with prometheus (this will require testing by PR submitter).

For aws-node daemonset, the metrics are published on :61678/metrics, yeah. That configuration looks correct to me, but I have not tested it.

jdn5126 avatar Feb 01 '23 20:02 jdn5126

Hello, many of us use the built-in kubernetes sd component of prometheus to do our scraping.

That feature uses pod annotations to tell prometheus whether or not to scrape, and what the details are for the particular target.

annotations:
   prometheus.io/path: /metrics
   prometheus.io/port: 61678
   prometheus.io/scrape: true

This is far simpler for so many situations. Is there any way we can get this incorporated as an option for the plugin?

thekuffs avatar Mar 02 '23 21:03 thekuffs

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions[bot] avatar May 02 '23 00:05 github-actions[bot]

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions[bot] avatar Jul 02 '23 00:07 github-actions[bot]

Issue closed due to inactivity.

github-actions[bot] avatar Jul 17 '23 00:07 github-actions[bot]

@balajisa09 thank you, I will start reviewing this soon

We can reuse USE_CLOUDWATCH to either push to CW or setup prometheus..we can discuss offline on this.

jayanthvn avatar Jul 18 '23 17:07 jayanthvn

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions[bot] avatar Sep 18 '23 00:09 github-actions[bot]

/not stale

jayanthvn avatar Sep 20 '23 18:09 jayanthvn

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

github-actions[bot] avatar Nov 15 '23 20:11 github-actions[bot]