kube-state-metrics icon indicating copy to clipboard operation
kube-state-metrics copied to clipboard

fix: use --track-unscheduled-pods to select unscheduled pods in Daemonset sharding

Open CatherineF-dev opened this issue 1 year ago • 25 comments

What this PR does / why we need it:

How does this change affect the cardinality of KSM: (increases, decreases or does not change cardinality)

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes https://github.com/kubernetes/kube-state-metrics/issues/2353

Tested using minikube

CatherineF-dev avatar May 09 '24 11:05 CatherineF-dev

Will this supersede https://github.com/kubernetes/kube-state-metrics/pull/2373 ?

mrueg avatar May 15 '24 13:05 mrueg

Will this supersede https://github.com/kubernetes/kube-state-metrics/pull/2373 ?

Yes.

CatherineF-dev avatar May 15 '24 13:05 CatherineF-dev

/assign /triage accepted

dgrisonnet avatar May 16 '24 16:05 dgrisonnet

/hold for others to review. /lgtm

mrueg avatar May 16 '24 20:05 mrueg

New changes are detected. LGTM label has been removed.

k8s-ci-robot avatar May 17 '24 11:05 k8s-ci-robot

/lgtm to trigger tide

CatherineF-dev avatar May 17 '24 11:05 CatherineF-dev

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CatherineF-dev, mrueg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar May 17 '24 11:05 k8s-ci-robot

@logicalhan Can we just skip that for now? We've been waiting for 6 weeks to https://github.com/kubernetes/kube-state-metrics/issues/2374 resolved, and we're having daily pain points with KSM right now in our environment. While I appreciate adding tests, we're literally removing code and making it simpler here.. can we bypass this and just get a fix in ASAP?

diranged avatar May 27 '24 22:05 diranged

@logicalhan Can we just skip that for now? We've been waiting for 6 weeks to https://github.com/kubernetes/kube-state-metrics/issues/2374 resolved, and we're having daily pain points with KSM right now in our environment. While I appreciate adding tests, we're literally removing code and making it simpler here.. can we bypass this and just get a fix in ASAP?

How do you know the fix works without tests?

logicalhan avatar May 27 '24 22:05 logicalhan

@logicalhan Can we just skip that for now? We've been waiting for 6 weeks to #2374 resolved, and we're having daily pain points with KSM right now in our environment. While I appreciate adding tests, we're literally removing code and making it simpler here.. can we bypass this and just get a fix in ASAP?

How do you know the fix works without tests?

I think this is one of those vanity test situations... I don't see it worth blocking on this just to get a code coverage line. If you have a test proposal, would you put it up? Otherwise, can we get this feature fixed since it's been broken?

diranged avatar May 27 '24 22:05 diranged

I think this is one of those vanity test situations... I don't see it worth blocking on this just to get a code coverage line. If you have a test proposal, would you put it up? Otherwise, can we get this feature fixed since it's been broken?

This bug exists because we didn't have test coverage what the heck are you talking about. The answer is no.

logicalhan avatar May 27 '24 22:05 logicalhan

If you want it so badly @diranged then you should write some tests. We accept pulls.

logicalhan avatar May 27 '24 22:05 logicalhan

If you want it so badly @diranged then you should write some tests. We accept pulls.

Given that I'm not deeply familiar with the code - that seems like a poor idea. I wish you would reconsider here how much you want to stand on the 'needs tests' statement for such a simple PR.

diranged avatar May 27 '24 22:05 diranged

@CatherineF-dev do you have plans to add tests to this PR to get it into a mergable state?

mrueg avatar Jun 05 '24 12:06 mrueg

Okay! Work in Progress

CatherineF-dev avatar Jun 05 '24 13:06 CatherineF-dev

@CatherineF-dev Is there anything I can do to help here? We had good momentum last week and I am looking to help continue that!

How do you feel about my comments for fixing the unit tests? (ie. adding back || len(*n) == 0 check)

I naively feel that fixing that will help out with e2e tests but I'm sure you have way more context working on this and other related code / architecture.

I am also available for video call if you need someone to use as a sounding board.

LaikaN57 avatar Jun 11 '24 22:06 LaikaN57

I was looking at the E2E test failure https://github.com/kubernetes/kube-state-metrics/actions/runs/9417050145/job/25941445490?pr=2388

Seems something are not cleaned up after Daemonset sharding test.

CatherineF-dev avatar Jun 12 '24 11:06 CatherineF-dev

@LaikaN57 fixed all tests now.

CatherineF-dev avatar Jun 13 '24 22:06 CatherineF-dev

Thanks @CatherineF-dev! I am OOO until Monday, but I did a quick review of the new code. Just 2 small nits but nothing blocking. I can review more fully on Monday if others do not approve by then. 🤞 I relly appreciate all the work you put into this!

LaikaN57 avatar Jun 14 '24 03:06 LaikaN57

@mrueg need a review again. Rebased to HEAD.

CatherineF-dev avatar Jun 19 '24 14:06 CatherineF-dev

/unhold

CatherineF-dev avatar Jun 20 '24 13:06 CatherineF-dev

/lgtm

CatherineF-dev avatar Jun 20 '24 13:06 CatherineF-dev

@CatherineF-dev: you cannot LGTM your own PR.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Jun 20 '24 13:06 k8s-ci-robot

tests?

@logicalhan Tests now implemented thanks to all of @CatherineF-dev's hard work 🙏 Looks like we are looking for review again now. Thanks!

LaikaN57 avatar Jun 20 '24 18:06 LaikaN57

Heads up: This is currently listed for the v2.13 milestone. In order to get a release out soon, I'll drop all open PRs by Wednesday from that milestone.

mrueg avatar Jul 15 '24 16:07 mrueg

@rexagod have fixed the change request. Thx! Need a lgtm

CatherineF-dev avatar Jul 17 '24 00:07 CatherineF-dev

@rexagod fixed.

CatherineF-dev avatar Jul 24 '24 12:07 CatherineF-dev

LGTM

dgrisonnet avatar Jul 24 '24 15:07 dgrisonnet

/lgtm

mrueg avatar Jul 24 '24 17:07 mrueg

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CatherineF-dev, mrueg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Jul 24 '24 17:07 k8s-ci-robot

@CatherineF-dev @mrueg @dgrisonnet FYI, I ran into some very easy to overlook syntax errors in this CL that eagle-eyed jsonnet complained loudly about when trying to customize kube-prometheus.

Please take a look at https://github.com/kubernetes/kube-state-metrics/pull/2454 for a quick fix.

jeffmccune avatar Jul 24 '24 23:07 jeffmccune