test-infra icon indicating copy to clipboard operation
test-infra copied to clipboard

Skip DRA from node-kubelet-serial-containerd-alpha job

Open kannon92 opened this issue 2 years ago • 24 comments

https://testgrid.k8s.io/sig-node-presubmits#pr-node-kubelet-serial-containerd-alpha-features

#Fixes https://github.com/kubernetes/kubernetes/issues/120928

The originally failure was caused by the feature toggle and runtime config not being set up.

DRA has a separate job where the right preconditions are set up. Therefore, we will filter out DRA from this job.

kannon92 avatar Sep 28 '23 18:09 kannon92

/cc @pohly @bart0sh

kannon92 avatar Sep 28 '23 18:09 kannon92

/unassign

I'll let @bart0sh handle this, he knows the node tests better.

pohly avatar Sep 29 '23 07:09 pohly

@bart0sh does this PR solve the issue?

kannon92 avatar Sep 29 '23 11:09 kannon92

@bart0sh does this PR solve the issue?

It doesn't. See my comment in the issue: https://github.com/kubernetes/kubernetes/issues/120928#issuecomment-1740830171

bart0sh avatar Sep 29 '23 12:09 bart0sh

PTAL @bart0sh.

We decided to review this job as DRA is covered by a dedicated job and we are promoting DRA to beta in 1.29. This test also has some issues with versions of containerd so we will just remove this test.

kannon92 avatar Sep 29 '23 14:09 kannon92

/retest

kannon92 avatar Sep 29 '23 19:09 kannon92

/lgtm /assign @dims @SergeyKanzhelev

bart0sh avatar Sep 29 '23 21:09 bart0sh

/assign @SergeyKanzhelev @tzneal

dims avatar Sep 29 '23 21:09 dims

@kannon92 Can you update a description of this PR to reflect the changes?

bart0sh avatar Oct 02 '23 10:10 bart0sh

@kannon92 Can you update a description of this PR to reflect the changes?

Done.

kannon92 avatar Oct 02 '23 13:10 kannon92

This job is useful for the new alpha features we may develop in master. Ideally we shouldn't create a new job for every new feature.

Why DRA tests fail?

SergeyKanzhelev avatar Oct 03 '23 19:10 SergeyKanzhelev

This job is useful for the new alpha features we may develop in master. Ideally we shouldn't create a new job for every new feature.

Why DRA tests fail?

The issue has more details on the investigation.

https://github.com/kubernetes/kubernetes/issues/120928#issuecomment-1740830171

DRA tests fail because we don't have the necessary scheduling APIs. But @bart0sh pointed out that there are more reasons that it would fail.

One thing we notice is that the FeatureGates all aplha doesn't work at that well with features that have more component changes. SideCar needs both FeatureGate and Service-Feature-Gate.

https://github.com/kubernetes/test-infra/pull/30908

For DRA, there is another command we need to add to read the scheduling APIs (as the logs say the necessary api is not found).

kannon92 avatar Oct 03 '23 19:10 kannon92

@bart0sh @SergeyKanzhelev We discussed this job and we decided that we should keep this job around for alpha features.

We will exclude DRA from this job rather than deleting it.

kannon92 avatar Oct 05 '23 14:10 kannon92

@bart0sh @SergeyKanzhelev We discussed this job and we decided that we should keep this job around for alpha features.

What would be a possible scenario of using it? If a new test with AlphaFuture is added, this job is most probably going to fail as most of alpha features require a feature gate. Moreover, if such a test is added, a correspondent job will be added as well, which would again make this job redundant. At least I can see this pattern happened twice lately: with sidecar containers and DRA.

bart0sh avatar Oct 05 '23 18:10 bart0sh

@bart0sh @SergeyKanzhelev We discussed this job and we decided that we should keep this job around for alpha features.

What would be a possible scenario of using it? If a new test with AlphaFuture is added, this job is most probably going to fail as most of alpha features require a feature gate. Moreover, if such a test is added, a correspondent job will be added as well, which would again make this job redundant. At least I can see this pattern happened twice lately: with sidecar containers and DRA.

I could be wrong but I don’t think it’s all features that use service feature gates. Sidecar and DRA span multiple components and I think service feature gate is required for those.

One suggestion was to look into a way to fix the AllAlpha so it should work with other components.

kannon92 avatar Oct 11 '23 12:10 kannon92

/retest

bart0sh avatar Dec 08 '23 11:12 bart0sh

@bart0sh what do you want to do with this PR?

kannon92 avatar Dec 11 '23 15:12 kannon92

@kannon92 Let's keep it. Can you change PR subject and description to reflect the changes? After that I'll be happy to lgtm it.

bart0sh avatar Dec 11 '23 15:12 bart0sh

/retest

bart0sh avatar Dec 11 '23 15:12 bart0sh

@kannon92 Let's keep it. Can you change PR subject and description to reflect the changes? After that I'll be happy to lgtm it.

Done.

kannon92 avatar Dec 11 '23 15:12 kannon92

@kannon92 CI failure is unrelated it seems. /lgtm

bart0sh avatar Dec 11 '23 16:12 bart0sh

/assign @SergeyKanzhelev @mrunalp @derekwaynecarr @dchen1107 for a final approval

bart0sh avatar Dec 11 '23 16:12 bart0sh

@kannon92: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-test-infra-misc-image-build-test 73bbb0e0838640b91a97e6669b57da85c8c364ce link true /test pull-test-infra-misc-image-build-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot avatar Apr 09 '24 22:04 k8s-ci-robot

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kannon92, SergeyKanzhelev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Apr 24 '24 19:04 k8s-ci-robot

@kannon92: Updated the job-config configmap in namespace default at cluster test-infra-trusted using the following files:

  • key sig-node-presubmit.yaml using file config/jobs/kubernetes/sig-node/sig-node-presubmit.yaml

In response to this:

https://testgrid.k8s.io/sig-node-presubmits#pr-node-kubelet-serial-containerd-alpha-features

#Fixes https://github.com/kubernetes/kubernetes/issues/120928

The originally failure was caused by the feature toggle and runtime config not being set up.

DRA has a separate job where the right preconditions are set up. Therefore, we will filter out DRA from this job.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 24 '24 20:04 k8s-ci-robot