kube-state-metrics kube-state-metrics with autosharding stops updating shards when the labels of the statefulset are updated

What happened:

When I updated the labels of the ksm statefulset when kube-state-metrics are deployed with autosharding updates, and thus the recalculation of shards, stop getting to pods that don't get restarted

What you expected to happen:

I would expect shards update to happen even when statefulset labels are updated, as in k8s this is allowed and it doesn't cause the pods to get restarted

How to reproduce it (as minimally and precisely as possible):

clone kube-state-metrics repo git clone https://github.com/kubernetes/kube-state-metrics && cd kube-state-metrics
with the editor of your choice add a label to the metadata of the sts prone to change in examples/autosharding/statefulset.yaml e.g. app.revision: '1' and set replicas to 2
deploy ksm with autosharding kubectl apply -k ./examples/autosharding
change the label kubectl patch sts kube-state-metrics -n kube-system --patch '{"metadata": {"labels": {"app.revision": "2"}}}'
scale the statefulset to 3 replicas kubectl scale statefulsets kube-state-metrics --replicas=3 -n kube-system
checks the logs of pod/kube-state-metrics-0 and pod/kube-state-metrics-1 they didn't update their shards calculation. This is recovered only after all pods are restarted and labels are not changed again.

Anything else we need to know?:

Environment:

kube-state-metrics version: 2.11.0

Kubernetes version (use kubectl version):

Client Version: v1.29.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.7-gke.1121002

Cloud provider or hardware configuration: Google Kubernetes Engine (GKE)
Other info: N/A

Mar 29 '24 23:03 pkoutsovasilis

This issue is currently awaiting triage.

If kube-state-metrics contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Mar 29 '24 23:03 k8s-ci-robot

@CatherineF-dev I tag you since I see you active throughout the issues and I really thank you in advance; would you know who is more appropriate to have a look at this issue and the PR I have opened about it? 🙂

Apr 04 '24 23:04 pkoutsovasilis

I can have a look. Will try reproducing this issue to understand what happened.

Apr 05 '24 01:04 CatherineF-dev

I prefer label selectors which is native in k8s.

qq: why do you want to change labels? Want to see whether it's a common use case or not.

Apr 09 '24 00:04 CatherineF-dev

I prefer label selectors which is native in k8s.

qq: why do you want to change labels? Want to see whether it's a common use case or not.

field selector is native in k8s as well?! So most, if not all, operators use label updating to keep track of things as this doesn't cause underlying pods to get restarted based on native behavior of k8s. So I was trying to embed kube-state metric in an operator when it happened to stumble upon this. In my understanding field selector is more robust as it eliminates any scenario that pods stop calculating properly the shards. Would you care expanding a little bit more why you prefer label selectors vs field selector and if you can what scenario you see that field selector falls short against label one?

Apr 09 '24 01:04 pkoutsovasilis

From my experience, label selector is more dynamic and unique compared to field selector.

I was considering a case where there are two KSMs running (it happens when deploying another KSM using commercial k8s ). Using label selector is easier than field selector.

Could you use label selector and other ways to mitigate this issue?

Apr 09 '24 01:04 CatherineF-dev

if field selector is a show stopper, how about defining a cli arg that allows to specify labels to be excluded from the label selector, you know the ones that are prone to change? Thus we can maintain the label selector and give the user the option to manually mitigate this?

Apr 09 '24 01:04 pkoutsovasilis

change the label kubectl patch sts kube-state-metrics -n kube-system --patch '{"metadata": {"labels": {"app.revision": "2"}}}'

qq: why do you want to update label here? What are the use cases of this new label?

Apr 09 '24 01:04 CatherineF-dev

As a simple example, it is a common pattern for operators to save some short of a hash in the labels of the k8s object, a statefulset in our case, so next time a reconcile fires it can compute the hash of the statefulset and compare it with the one in the labels. If they match it means that reconcile loop of the operator can go to the next step, if any. On the opposite, this is our scenario, it does any step it needs to do regarding this k8s object, e.g. update a secret, and write the new hash in the labels. Thus an operator by using the labels that don't cause any workload restart can keep a state of what he has processed through its reconcile loop and what it hasn't. So this label is changed by the operator. Does this example help a little bit more? 🙂

Apr 09 '24 01:04 pkoutsovasilis

From my experience, label selector is more dynamic and unique compared to field selector.

I was considering a case where there are two KSMs running (it happens when deploying another KSM using commercial k8s ). Using label selector is easier than field selector.

Could you use label selector and other ways to mitigate this issue?

I am still trying to wrap my head around that. So here in the AddFunc of the SharedIndexInformer and in the UpdateFunc we check for the name isn't that equivalent of a field selector? because if we filter out the events by sts name what is the purpose of the label selector?!

Apr 09 '24 01:04 pkoutsovasilis

and on that page this is a little bit more tricky because when somebody deploys kube-state-metric without any labels on the statefulset level, these are not mandatory, we get events for all sts in the namespace?! fields.SelectorFromSet(ss.Labels) with empty ss.Labels returns Everything() selector. IMO that's another valid reason for this to be a field selector 🙂

Apr 09 '24 02:04 pkoutsovasilis

So here in the AddFunc of the SharedIndexInformer and in the UpdateFunc we check for the name isn't that equivalent of a field selector? because if we filter out the events by sts name what is the purpose of the label selector?!

The existing code can handle 2 KSMs case easier.

Watch: labelSelector
UpdateFunc/AddFunc: a simple check

Apr 09 '24 02:04 CatherineF-dev

Why field selector doesn't support 2 KSMs easy!? It is bound to namespace and it essentially allows the same events, wrt the label selector, that pass the if checks to flow through the SharedIndexInformer. But probably I am missing something so could you please help me

Apr 09 '24 02:04 pkoutsovasilis

this is a kube-state-metric deployed with my PR that contains field-selector so:

two KSMs separate namespace different name

https://github.com/kubernetes/kube-state-metrics/assets/12166023/b2c246d5-ab3d-46f7-82b3-7bd27db2d06b

two KSMs same namespace different name

https://github.com/kubernetes/kube-state-metrics/assets/12166023/1443d5b1-008b-4162-ba0f-7146ea6a7344

I can do more experiments if you help with the case that field selector is not able to cover

Apr 09 '24 02:04 pkoutsovasilis

two KSMs separate namespace different name

Was considering same name case. It might be a breaking change to users who use labelSelector to differentiate two KSMs.

Apr 09 '24 13:04 CatherineF-dev

This is recovered only after all pods are restarted and labels are not changed again.

Is it self-recovered?

Apr 09 '24 13:04 CatherineF-dev

Was considering same name case. It might be a breaking change to users who use labelSelector to differentiate two KSMs.

even in that case field selector (different namespace same name) under my open PR is working as expected @CatherineF-dev

https://github.com/kubernetes/kube-state-metrics/assets/12166023/23196e47-0c3a-4628-a2db-cec909c1fa08

Is it self-recovered?

nope if the labels change is not self-recovered you have to restart the pods to see the new labels. On the contrary, with field selector nothing can break the sharding calculation.

It is simple, you have some criteria to keep updating your shards properly and you can set these only at the start time of the app. However, the current criteria (aka labelselector) by k8s-design are allowed to change without any pod invalidation, so no restart here. Such a change could make your criteria invalid as it may no longer correspond to any object. Every time I try to wrap my head around this, for keeping the shards of a statefulset always in-line and proper, field selector is what I would choose.

Apr 09 '24 13:04 pkoutsovasilis

even in that case field selector (different namespace same name) under my open PR is working as expected

Cool. The tricky case might be same namespace name and different labels.

Apr 17 '24 21:04 CatherineF-dev

@CatherineF-dev are you joking me?

quoting from k8s page

Each object in your cluster has a Name that is unique for that type of resource. Every Kubernetes object also has a UID that is unique across your whole cluster.

For example, you can only have one Pod named myapp-1234 within the same namespace, but you can have one Pod and one Deployment that are each named myapp-1234.

For non-unique user-provided attributes, Kubernetes provides labels and annotations.

From the above, and personal experience, I deem that it is not allowed by k8s to have an object of the same type (statefulset in our case) in the same namespace with the same name. But please if you know otherwise educate me

Apr 17 '24 21:04 pkoutsovasilis

You're right, forgot about that.

LGTM.

Apr 18 '24 00:04 CatherineF-dev