kueue icon indicating copy to clipboard operation
kueue copied to clipboard

Introduce the ClusterAutoscaler APIs module

Open tenzen-y opened this issue 1 year ago • 8 comments

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

I replaced the k8s.io/autoscaler/cluster-autoscaler with the k8s.io/autoscaler/cluster-autoscaler/apis because https://github.com/kubernetes/autoscaler/pull/6315/ is finally merged.

Which issue(s) this PR fixes:

Fixes #1345

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

tenzen-y avatar Mar 20 '24 16:03 tenzen-y

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Mar 20 '24 16:03 k8s-ci-robot

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
Latest commit cf651cf1f32a357154055da22c4053970fe387bd
Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/6601869aba72fc0008a9b5b1

netlify[bot] avatar Mar 20 '24 16:03 netlify[bot]

https://github.com/kubernetes-sigs/kueue/blob/a751ffd7e902e61e56f24c499eeffb5b7ec5f7af/Makefile#L400-L404

needs an update

trasc avatar Mar 20 '24 17:03 trasc

https://github.com/kubernetes-sigs/kueue/blob/a751ffd7e902e61e56f24c499eeffb5b7ec5f7af/Makefile#L400-L404

needs an update

Yes, that's right. Also, we need to upgrade Golang version to v1.22 due to https://github.com/kubernetes/autoscaler/commit/eb5d875837a16cbb3a6b0eefb9fde41d9ecbcd85.

go: downloading k8s.io/autoscaler/cluster-autoscaler v0.0.0-20240320105552-09954b6741cb
go: k8s.io/autoscaler/[email protected] requires go >= 1.22; switching to go1.22.1
go: downloading go1.22.1 (darwin/arm64)
go: downloading k8s.io/autoscaler/cluster-autoscaler v0.0.0-20240320105552-09954b6741cb

Or, putting https://github.com/kubernetes/autoscaler/tree/09954b6741cbb910971916c079f45f6e8878d192/cluster-autoscaler/config/crd to https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/apis/provisioningrequest would be better.

@alculquicondor @x13n WDYT?

tenzen-y avatar Mar 20 '24 23:03 tenzen-y

Putting the CRD in CA apis module makes sense to me, it has to be in sync with the code there anyway.

x13n avatar Mar 21 '24 10:03 x13n

I'm also ok upgrading to golang 1.22

But all the APIs should be in the separate module

alculquicondor avatar Mar 21 '24 12:03 alculquicondor

@x13n @alculquicondor Thank you for responding. I can take both approaches. Let me try them.

tenzen-y avatar Mar 21 '24 13:03 tenzen-y

@x13n @alculquicondor Thank you for responding. I can take both approaches. Let me try them.

Raised at https://github.com/kubernetes/autoscaler/pull/6651

tenzen-y avatar Mar 24 '24 14:03 tenzen-y

https://github.com/kubernetes/autoscaler/pull/6651 is merged.

mwielgus avatar Mar 25 '24 11:03 mwielgus

kubernetes/autoscaler#6651 is merged.

Thank you!

tenzen-y avatar Mar 25 '24 13:03 tenzen-y

@alculquicondor This PR is ready for review.

tenzen-y avatar Mar 25 '24 14:03 tenzen-y

Thank you so much @tenzen-y!!!!

This was a long effort that simplifies the maintainability of our deps greatly!

I appreciate to collaboration with SIG-Autoscaling (@x13n and @mwielgus)!

@alculquicondor btw maybe you forgot to add /lgtm label 🙃

tenzen-y avatar Mar 25 '24 18:03 tenzen-y

cc: @B1F030 @village-way

tenzen-y avatar Mar 25 '24 18:03 tenzen-y

I did indeed forget :)

/lgtm

alculquicondor avatar Mar 25 '24 18:03 alculquicondor

LGTM label has been added.

Git tree hash: 2111d2abc5d15cc89d44ac302fa4c42e8b44d122

k8s-ci-robot avatar Mar 25 '24 18:03 k8s-ci-robot

@alculquicondor How about cherry-picking this into the release-0.6 branch?

tenzen-y avatar Mar 25 '24 19:03 tenzen-y

I'm not too sure about that. Why do you suggest it?

Also, I'm not sure what's the origin of the Job failure.

alculquicondor avatar Mar 25 '24 19:03 alculquicondor

I'm not too sure about that. Why do you suggest it?

Cherry-picking would allow cluster-admins to avoid https://github.com/kubernetes-sigs/kueue/issues/1345 error when cluster-admins implement a separate controller for the in-house job.

[MODIFIED COMMENT]

tenzen-y avatar Mar 25 '24 20:03 tenzen-y

Also, I'm not sure what's the origin of the Job failure.

I'll try to investigate CI failure.

tenzen-y avatar Mar 25 '24 20:03 tenzen-y

...
[FAILED] Test Report unavailable because a Ginkgo parallel process disappeared
    The aggregated report could not be fetched for a ReportAfterSuite node.  A
    Ginkgo parallel process disappeared before it could finish reporting.
...
This occurs if a parallel process exits before it reports its results to the
Ginkgo CLI.  The CLI will now print out all the stdout/stderr output it's
collected from the running processes.  However you may not see anything useful
in these logs because the individual test processes usually intercept output to
stdout/stderr in order to capture it in the spec reports.

Maybe this is the reason 🧐

tenzen-y avatar Mar 25 '24 20:03 tenzen-y

I guess that this error occurred by throttling CPU performance because integration tests often use larger CPUs.

https://monitoring-eks.prow.k8s.io/d/96Q8oOOZk/builds?orgId=1&from=1711225845388&to=1711398645388&var-org=kubernetes-sigs&var-repo=kueue&var-job=pull-kueue-test-integration-main&var-build=All&refresh=30s&viewPanel=136

https://github.com/kubernetes/test-infra/blob/ce9cda237cf4b7518433dcd0ee90255e15c9c031/config/jobs/kubernetes-sigs/kueue/kueue-presubmits-main.yaml#L58-L59

tenzen-y avatar Mar 25 '24 20:03 tenzen-y

https://monitoring-eks.prow.k8s.io/d/96Q8oOOZk/builds?orgId=1&from=1711225845388&to=1711398645388&var-org=kubernetes-sigs&var-repo=kueue&var-job=pull-kueue-test-integration-main&var-build=All&refresh=30s&viewPanel=136

It seems that we need to set 8 Cores in the limit. @alculquicondor WDYT?

tenzen-y avatar Mar 25 '24 20:03 tenzen-y

One of my teammates is working in splitting the Multikueue E2E test into its own job. Let's decide if the limits are ok after that.

alculquicondor avatar Mar 25 '24 20:03 alculquicondor

One of my teammates is working in splitting the Multikueue E2E test into its own job. Let's decide if the limits are ok after that.

That makes sense. /test pull-kueue-test-integration-main

tenzen-y avatar Mar 25 '24 20:03 tenzen-y

I'm not too sure about that. Why do you suggest it?

Cherry-picking would allow cluster-admins to avoid #1345 error when cluster-admins implement a separate controller for the in-house job.

[MODIFIED COMMENT]

@alculquicondor How about this?

tenzen-y avatar Mar 25 '24 20:03 tenzen-y

/cherry-pick release-0.6

alculquicondor avatar Mar 26 '24 12:03 alculquicondor

@alculquicondor: #1872 failed to apply on top of branch "release-0.6":

Applying: Introduce the ClusterAutoscaler APIs module
Using index info to reconstruct a base tree...
M	Makefile
M	cmd/kueue/main.go
M	go.mod
M	go.sum
M	pkg/controller/admissionchecks/provisioning/controller.go
M	pkg/controller/admissionchecks/provisioning/controller_test.go
M	pkg/controller/admissionchecks/provisioning/indexer_test.go
M	test/integration/controller/admissionchecks/provisioning/provisioning_test.go
Falling back to patching base and 3-way merge...
Auto-merging test/integration/controller/admissionchecks/provisioning/provisioning_test.go
Auto-merging pkg/controller/admissionchecks/provisioning/indexer_test.go
CONFLICT (content): Merge conflict in pkg/controller/admissionchecks/provisioning/indexer_test.go
Auto-merging pkg/controller/admissionchecks/provisioning/controller_test.go
Auto-merging pkg/controller/admissionchecks/provisioning/controller.go
Auto-merging go.sum
CONFLICT (content): Merge conflict in go.sum
Auto-merging go.mod
CONFLICT (content): Merge conflict in go.mod
Auto-merging cmd/kueue/main.go
Auto-merging Makefile
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Introduce the ClusterAutoscaler APIs module
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-0.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@alculquicondor: #1872 failed to apply on top of branch "release-0.6":

Applying: Introduce the ClusterAutoscaler APIs module
Using index info to reconstruct a base tree...
M	Makefile
M	cmd/kueue/main.go
M	go.mod
M	go.sum
M	pkg/controller/admissionchecks/provisioning/controller.go
M	pkg/controller/admissionchecks/provisioning/controller_test.go
M	pkg/controller/admissionchecks/provisioning/indexer_test.go
M	test/integration/controller/admissionchecks/provisioning/provisioning_test.go
Falling back to patching base and 3-way merge...
Auto-merging test/integration/controller/admissionchecks/provisioning/provisioning_test.go
Auto-merging pkg/controller/admissionchecks/provisioning/indexer_test.go
CONFLICT (content): Merge conflict in pkg/controller/admissionchecks/provisioning/indexer_test.go
Auto-merging pkg/controller/admissionchecks/provisioning/controller_test.go
Auto-merging pkg/controller/admissionchecks/provisioning/controller.go
Auto-merging go.sum
CONFLICT (content): Merge conflict in go.sum
Auto-merging go.mod
CONFLICT (content): Merge conflict in go.mod
Auto-merging cmd/kueue/main.go
Auto-merging Makefile
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Introduce the ClusterAutoscaler APIs module
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

ACK

tenzen-y avatar Mar 26 '24 12:03 tenzen-y