capsule icon indicating copy to clipboard operation
capsule copied to clipboard

Tenant Metrics

Open oliverbaehler opened this issue 4 years ago • 15 comments

Describe the feature

We would like to have more metrics being exported about the current tenant controlled by an operator. Some Metrics t hat would be helpful:

  • Basic info metric (Tenant Active State, Namespace Quota and used Namespaces, so we can count up how many tenants there are)
  • Cordoned tenants (Consider the tenant cordoning label, to tell which tenants are cordoned and which are not)
  • Quota Usage (Evaluate how much of o quota spanned over a tenant is used vs max given quota). Same on namespace basis

These are the most important ones i could think of. Another interesting feature (which most other metric exporter are lacking) ist the possibility to add labels to my resource which then are added as metric label. So le't's say I want to be able to show all tenants of certain customer. I would add the label metrics.clastix.io/customer=a to a tenant cr and then the label customerwith the value a would show up in the metric. This way everyone has much greater flexibility to organize metrics, even if they come from the same controller. Simply if you would check if there's any label matching metrics.clastix.io/* and then register it with the actual metric.

What would the new user story look like?

Doesn't change.

Expected behavior

A clear and concise description of what you expect to happen.

oliverbaehler avatar Oct 19 '21 07:10 oliverbaehler

@oliverbaehler thanks for submitting this request. Implementing them is pretty easy and straightforward. Would you like to submit a PR too?

bsctl avatar Oct 19 '21 07:10 bsctl

@bsctl Yes I will try

oliverbaehler avatar Oct 19 '21 07:10 oliverbaehler

@oliverbaehler any progress on this issue? Do you need for help?

bsctl avatar Nov 16 '21 09:11 bsctl

In consultation with @oliverbaehler I would like to take on this topic. However, the implementation does not seem as easy as written by @bsctl . How are metrics implemented in general by capsule and how to add custom metrics?

adberger avatar Dec 07 '21 06:12 adberger

@prometherion Any idea where to get started?

adberger avatar Dec 21 '21 08:12 adberger

Hey @adberger, thanks for the ping here!

Since Capsule is built on top of controller-runtime, we already have some controller metrics exported: this is really convenient since we don't have to put in place any additional webserver for metrics exposure, and implement the collector.

I started working on this a few days ago, just pushed on branch issues/451 the work in progress, if you could take a look at it would be perfect so I can share the remarks I found so far.

Let's take the example of summarizing the total overall count of cordoned and active Tenants: I would expect a metric named capsule_tenants_status_count with two labels, such as cordoned and active. Exposing this kind of metric using the prometheus.NewCounterFunc is not possible since we cannot use labels that would do the trick.

However, since you're really active on this topic, would be great to have feedback from you regarding the structure of these metrics: would be bad to have metrics such as capsule_tenants_active_count and capsule_tenants_cordoned_count?

To me, honestly, having two time series is pretty odd, since Prometheus labels are solving this kind of problem.

prometherion avatar Dec 21 '21 13:12 prometherion

Hey @adberger, thanks for the ping here!

Since Capsule is built on top of controller-runtime, we already have some controller metrics exported: this is really convenient since we don't have to put in place any additional webserver for metrics exposure, and implement the collector.

I started working on this a few days ago, just pushed on branch issues/451 the work in progress, if you could take a look at it would be perfect so I can share the remarks I found so far.

Let's take the example of summarizing the total overall count of cordoned and active Tenants: I would expect a metric named capsule_tenants_status_count with two labels, such as cordoned and active. Exposing this kind of metric using the prometheus.NewCounterFunc is not possible since we cannot use labels that would do the trick.

However, since you're really active on this topic, would be great to have feedback from you regarding the structure of these metrics: would be bad to have metrics such as capsule_tenants_active_count and capsule_tenants_cordoned_count?

To me, honestly, having two time series is pretty odd, since Prometheus labels are solving this kind of problem.

Thank you very much. I'll look into it until 9th of January and give feedback.

adberger avatar Dec 27 '21 07:12 adberger

@prometherion I might be doing something wrong but I don't see the metrics yet.

I did https://capsule.clastix.io/docs/contributing/development/#fork-build-and-deploy-capsule with a kind cluster Then I did kubectl port-forward service/capsule-controller-manager-metrics-service -n capsule-system 8888:8080 and opened http://localhost:8888/metrics in a browser.

Regarding your question: Having two metrics (capsule_tenants_active_count & capsule_tenants_cordoned_count) seems fine for me. You just need different PromQL queries, but the result stays the same.

adberger avatar Jan 06 '22 08:01 adberger

I might be doing something wrong but I don't see the metrics yet

Are you referring to the custom or basic ones?

Becuase for the latters:

curl -s localhost:8888/metrics | wc -l
1618

prometherion avatar Jan 11 '22 15:01 prometherion

I might be doing something wrong but I don't see the metrics yet

Are you referring to the custom or basic ones?

Becuase for the latters:

curl -s localhost:8888/metrics | wc -l
1618

I meant the custom ones

adberger avatar Jan 13 '22 07:01 adberger

Solved with https://github.com/kubernetes/kube-state-metrics/blob/main/docs/customresourcestate-metrics.md

Example with kube-prometheus-stack Helm Chart (https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack):

kube-state-metrics:
  rbac:
    extraRules:
      - apiGroups: [ "capsule.clastix.io" ]
        resources: ["tenants"]
        verbs: [ "list", "watch" ]
  customResourceState:
    enabled: true
    config:
      spec:
        resources:
          - groupVersionKind:
              group: capsule.clastix.io
              kind: "Tenant"
              version: "v1beta2"
            labelsFromPath:
              name: [metadata, name]
            metrics:
              - name: "tenant_size"
                help: "Count of namespaces in the tenant"
                each:
                  type: Gauge
                  gauge:
                    path: [status, size]
                commonLabels:
                  custom_metric: "yes"
                labelsFromPath:
                  capsule_tenant: [metadata, name]
                  kind: [ kind ]
              - name: "tenant_state"
                help: "The operational state of the Tenant"
                each:
                  type: StateSet
                  stateSet:
                    labelName: state
                    path: [status, state]
                    list: [Active, Cordoned]
                commonLabels:
                  custom_metric: "yes"
                labelsFromPath:
                  capsule_tenant: [metadata, name]
                  kind: [ kind ]
              - name: "tenant_namespaces_info"
                help: "Namespaces of a Tenant"
                each:
                  type: Info
                  info:
                    path: [status, namespaces]
                    labelsFromPath:
                      tenant_namespace: []
                commonLabels:
                  custom_metric: "yes"
                labelsFromPath:
                  capsule_tenant: [metadata, name]
                  kind: [ kind ]

adberger avatar Oct 23 '23 06:10 adberger

This is definitely gold, thanks for sharing @adberger: could we transform this issue from a code-based feature to a documentation one?

Looking forward to reviewing a PR from you!

prometherion avatar Oct 24 '23 08:10 prometherion

@prometherion Sorry, I currently don't have any intention to make a contribution to the documentation of capsule because I'm quite busy at the moment.

adberger avatar Oct 24 '23 09:10 adberger