kueue
kueue copied to clipboard
Introduce the ClusterAutoscaler APIs module
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
I replaced the k8s.io/autoscaler/cluster-autoscaler with the k8s.io/autoscaler/cluster-autoscaler/apis because https://github.com/kubernetes/autoscaler/pull/6315/ is finally merged.
Which issue(s) this PR fixes:
Fixes #1345
Special notes for your reviewer:
Does this PR introduce a user-facing change?
NONE
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: tenzen-y
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [tenzen-y]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Deploy Preview for kubernetes-sigs-kueue canceled.
| Name | Link |
|---|---|
| Latest commit | cf651cf1f32a357154055da22c4053970fe387bd |
| Latest deploy log | https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/6601869aba72fc0008a9b5b1 |
https://github.com/kubernetes-sigs/kueue/blob/a751ffd7e902e61e56f24c499eeffb5b7ec5f7af/Makefile#L400-L404
needs an update
https://github.com/kubernetes-sigs/kueue/blob/a751ffd7e902e61e56f24c499eeffb5b7ec5f7af/Makefile#L400-L404
needs an update
Yes, that's right. Also, we need to upgrade Golang version to v1.22 due to https://github.com/kubernetes/autoscaler/commit/eb5d875837a16cbb3a6b0eefb9fde41d9ecbcd85.
go: downloading k8s.io/autoscaler/cluster-autoscaler v0.0.0-20240320105552-09954b6741cb
go: k8s.io/autoscaler/[email protected] requires go >= 1.22; switching to go1.22.1
go: downloading go1.22.1 (darwin/arm64)
go: downloading k8s.io/autoscaler/cluster-autoscaler v0.0.0-20240320105552-09954b6741cb
Or, putting https://github.com/kubernetes/autoscaler/tree/09954b6741cbb910971916c079f45f6e8878d192/cluster-autoscaler/config/crd to https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/apis/provisioningrequest would be better.
@alculquicondor @x13n WDYT?
Putting the CRD in CA apis module makes sense to me, it has to be in sync with the code there anyway.
I'm also ok upgrading to golang 1.22
But all the APIs should be in the separate module
@x13n @alculquicondor Thank you for responding. I can take both approaches. Let me try them.
@x13n @alculquicondor Thank you for responding. I can take both approaches. Let me try them.
Raised at https://github.com/kubernetes/autoscaler/pull/6651
https://github.com/kubernetes/autoscaler/pull/6651 is merged.
@alculquicondor This PR is ready for review.
Thank you so much @tenzen-y!!!!
This was a long effort that simplifies the maintainability of our deps greatly!
I appreciate to collaboration with SIG-Autoscaling (@x13n and @mwielgus)!
@alculquicondor btw maybe you forgot to add /lgtm label 🙃
cc: @B1F030 @village-way
I did indeed forget :)
/lgtm
LGTM label has been added.
@alculquicondor How about cherry-picking this into the release-0.6 branch?
I'm not too sure about that. Why do you suggest it?
Also, I'm not sure what's the origin of the Job failure.
I'm not too sure about that. Why do you suggest it?
Cherry-picking would allow cluster-admins to avoid https://github.com/kubernetes-sigs/kueue/issues/1345 error when cluster-admins implement a separate controller for the in-house job.
[MODIFIED COMMENT]
Also, I'm not sure what's the origin of the Job failure.
I'll try to investigate CI failure.
...
[FAILED] Test Report unavailable because a Ginkgo parallel process disappeared
The aggregated report could not be fetched for a ReportAfterSuite node. A
Ginkgo parallel process disappeared before it could finish reporting.
...
This occurs if a parallel process exits before it reports its results to the
Ginkgo CLI. The CLI will now print out all the stdout/stderr output it's
collected from the running processes. However you may not see anything useful
in these logs because the individual test processes usually intercept output to
stdout/stderr in order to capture it in the spec reports.
Maybe this is the reason 🧐
I guess that this error occurred by throttling CPU performance because integration tests often use larger CPUs.
https://monitoring-eks.prow.k8s.io/d/96Q8oOOZk/builds?orgId=1&from=1711225845388&to=1711398645388&var-org=kubernetes-sigs&var-repo=kueue&var-job=pull-kueue-test-integration-main&var-build=All&refresh=30s&viewPanel=136
https://github.com/kubernetes/test-infra/blob/ce9cda237cf4b7518433dcd0ee90255e15c9c031/config/jobs/kubernetes-sigs/kueue/kueue-presubmits-main.yaml#L58-L59
https://monitoring-eks.prow.k8s.io/d/96Q8oOOZk/builds?orgId=1&from=1711225845388&to=1711398645388&var-org=kubernetes-sigs&var-repo=kueue&var-job=pull-kueue-test-integration-main&var-build=All&refresh=30s&viewPanel=136
It seems that we need to set 8 Cores in the limit. @alculquicondor WDYT?
One of my teammates is working in splitting the Multikueue E2E test into its own job. Let's decide if the limits are ok after that.
One of my teammates is working in splitting the Multikueue E2E test into its own job. Let's decide if the limits are ok after that.
That makes sense. /test pull-kueue-test-integration-main
I'm not too sure about that. Why do you suggest it?
Cherry-picking would allow cluster-admins to avoid #1345 error when cluster-admins implement a separate controller for the in-house job.
[MODIFIED COMMENT]
@alculquicondor How about this?
/cherry-pick release-0.6
@alculquicondor: #1872 failed to apply on top of branch "release-0.6":
Applying: Introduce the ClusterAutoscaler APIs module
Using index info to reconstruct a base tree...
M Makefile
M cmd/kueue/main.go
M go.mod
M go.sum
M pkg/controller/admissionchecks/provisioning/controller.go
M pkg/controller/admissionchecks/provisioning/controller_test.go
M pkg/controller/admissionchecks/provisioning/indexer_test.go
M test/integration/controller/admissionchecks/provisioning/provisioning_test.go
Falling back to patching base and 3-way merge...
Auto-merging test/integration/controller/admissionchecks/provisioning/provisioning_test.go
Auto-merging pkg/controller/admissionchecks/provisioning/indexer_test.go
CONFLICT (content): Merge conflict in pkg/controller/admissionchecks/provisioning/indexer_test.go
Auto-merging pkg/controller/admissionchecks/provisioning/controller_test.go
Auto-merging pkg/controller/admissionchecks/provisioning/controller.go
Auto-merging go.sum
CONFLICT (content): Merge conflict in go.sum
Auto-merging go.mod
CONFLICT (content): Merge conflict in go.mod
Auto-merging cmd/kueue/main.go
Auto-merging Makefile
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Introduce the ClusterAutoscaler APIs module
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
In response to this:
/cherry-pick release-0.6
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@alculquicondor: #1872 failed to apply on top of branch "release-0.6":
Applying: Introduce the ClusterAutoscaler APIs module Using index info to reconstruct a base tree... M Makefile M cmd/kueue/main.go M go.mod M go.sum M pkg/controller/admissionchecks/provisioning/controller.go M pkg/controller/admissionchecks/provisioning/controller_test.go M pkg/controller/admissionchecks/provisioning/indexer_test.go M test/integration/controller/admissionchecks/provisioning/provisioning_test.go Falling back to patching base and 3-way merge... Auto-merging test/integration/controller/admissionchecks/provisioning/provisioning_test.go Auto-merging pkg/controller/admissionchecks/provisioning/indexer_test.go CONFLICT (content): Merge conflict in pkg/controller/admissionchecks/provisioning/indexer_test.go Auto-merging pkg/controller/admissionchecks/provisioning/controller_test.go Auto-merging pkg/controller/admissionchecks/provisioning/controller.go Auto-merging go.sum CONFLICT (content): Merge conflict in go.sum Auto-merging go.mod CONFLICT (content): Merge conflict in go.mod Auto-merging cmd/kueue/main.go Auto-merging Makefile error: Failed to merge in the changes. hint: Use 'git am --show-current-patch=diff' to see the failed patch Patch failed at 0001 Introduce the ClusterAutoscaler APIs module When you have resolved this problem, run "git am --continue". If you prefer to skip this patch, run "git am --skip" instead. To restore the original branch and stop patching, run "git am --abort".
ACK