capsule
capsule copied to clipboard
Tenant Metrics
Describe the feature
We would like to have more metrics being exported about the current tenant controlled by an operator. Some Metrics t hat would be helpful:
- Basic info metric (Tenant Active State, Namespace Quota and used Namespaces, so we can count up how many tenants there are)
- Cordoned tenants (Consider the tenant cordoning label, to tell which tenants are cordoned and which are not)
- Quota Usage (Evaluate how much of o quota spanned over a tenant is used vs max given quota). Same on namespace basis
These are the most important ones i could think of. Another interesting feature (which most other metric exporter are lacking) ist the possibility to add labels to my resource which then are added as metric label. So le't's say I want to be able to show all tenants of certain customer. I would add the label metrics.clastix.io/customer=a to a tenant cr and then the label customerwith the value a would show up in the metric. This way everyone has much greater flexibility to organize metrics, even if they come from the same controller. Simply if you would check if there's any label matching metrics.clastix.io/* and then register it with the actual metric.
What would the new user story look like?
Doesn't change.
Expected behavior
A clear and concise description of what you expect to happen.
@oliverbaehler thanks for submitting this request. Implementing them is pretty easy and straightforward. Would you like to submit a PR too?
@bsctl Yes I will try
@oliverbaehler any progress on this issue? Do you need for help?
In consultation with @oliverbaehler I would like to take on this topic. However, the implementation does not seem as easy as written by @bsctl . How are metrics implemented in general by capsule and how to add custom metrics?
@prometherion Any idea where to get started?
Hey @adberger, thanks for the ping here!
Since Capsule is built on top of controller-runtime, we already have some controller metrics exported: this is really convenient since we don't have to put in place any additional webserver for metrics exposure, and implement the collector.
I started working on this a few days ago, just pushed on branch issues/451 the work in progress, if you could take a look at it would be perfect so I can share the remarks I found so far.
Let's take the example of summarizing the total overall count of cordoned and active Tenants: I would expect a metric named capsule_tenants_status_count with two labels, such as cordoned and active. Exposing this kind of metric using the prometheus.NewCounterFunc is not possible since we cannot use labels that would do the trick.
However, since you're really active on this topic, would be great to have feedback from you regarding the structure of these metrics: would be bad to have metrics such as capsule_tenants_active_count and capsule_tenants_cordoned_count?
To me, honestly, having two time series is pretty odd, since Prometheus labels are solving this kind of problem.
Hey @adberger, thanks for the ping here!
Since Capsule is built on top of
controller-runtime, we already have some controller metrics exported: this is really convenient since we don't have to put in place any additional webserver for metrics exposure, and implement the collector.I started working on this a few days ago, just pushed on branch
issues/451the work in progress, if you could take a look at it would be perfect so I can share the remarks I found so far.Let's take the example of summarizing the total overall count of cordoned and active Tenants: I would expect a metric named
capsule_tenants_status_countwith two labels, such ascordonedandactive. Exposing this kind of metric using theprometheus.NewCounterFuncis not possible since we cannot use labels that would do the trick.However, since you're really active on this topic, would be great to have feedback from you regarding the structure of these metrics: would be bad to have metrics such as
capsule_tenants_active_countandcapsule_tenants_cordoned_count?To me, honestly, having two time series is pretty odd, since Prometheus labels are solving this kind of problem.
Thank you very much. I'll look into it until 9th of January and give feedback.
@prometherion I might be doing something wrong but I don't see the metrics yet.
I did https://capsule.clastix.io/docs/contributing/development/#fork-build-and-deploy-capsule with a kind cluster
Then I did kubectl port-forward service/capsule-controller-manager-metrics-service -n capsule-system 8888:8080 and opened http://localhost:8888/metrics in a browser.
Regarding your question: Having two metrics (capsule_tenants_active_count & capsule_tenants_cordoned_count) seems fine for me. You just need different PromQL queries, but the result stays the same.
I might be doing something wrong but I don't see the metrics yet
Are you referring to the custom or basic ones?
Becuase for the latters:
curl -s localhost:8888/metrics | wc -l
1618
I might be doing something wrong but I don't see the metrics yet
Are you referring to the custom or basic ones?
Becuase for the latters:
curl -s localhost:8888/metrics | wc -l 1618
I meant the custom ones
Solved with https://github.com/kubernetes/kube-state-metrics/blob/main/docs/customresourcestate-metrics.md
Example with kube-prometheus-stack Helm Chart (https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack):
kube-state-metrics:
rbac:
extraRules:
- apiGroups: [ "capsule.clastix.io" ]
resources: ["tenants"]
verbs: [ "list", "watch" ]
customResourceState:
enabled: true
config:
spec:
resources:
- groupVersionKind:
group: capsule.clastix.io
kind: "Tenant"
version: "v1beta2"
labelsFromPath:
name: [metadata, name]
metrics:
- name: "tenant_size"
help: "Count of namespaces in the tenant"
each:
type: Gauge
gauge:
path: [status, size]
commonLabels:
custom_metric: "yes"
labelsFromPath:
capsule_tenant: [metadata, name]
kind: [ kind ]
- name: "tenant_state"
help: "The operational state of the Tenant"
each:
type: StateSet
stateSet:
labelName: state
path: [status, state]
list: [Active, Cordoned]
commonLabels:
custom_metric: "yes"
labelsFromPath:
capsule_tenant: [metadata, name]
kind: [ kind ]
- name: "tenant_namespaces_info"
help: "Namespaces of a Tenant"
each:
type: Info
info:
path: [status, namespaces]
labelsFromPath:
tenant_namespace: []
commonLabels:
custom_metric: "yes"
labelsFromPath:
capsule_tenant: [metadata, name]
kind: [ kind ]
This is definitely gold, thanks for sharing @adberger: could we transform this issue from a code-based feature to a documentation one?
Looking forward to reviewing a PR from you!
@prometherion Sorry, I currently don't have any intention to make a contribution to the documentation of capsule because I'm quite busy at the moment.