node-feature-discovery E2E infra migration

What would you like to be added: Run E2E tests on dedicated baremetal machines as a presubmit job for every future PR. Why is this needed: Current infra (an AWS cluster) doesn't allow us to run more than one instance of E2E test in parallel because they all would need to run on the same K8S cluster, which potentially might break tests due to the overlap. Because of that we only have a post submit E2E job, that we run after a patch lands on the repository. In practice this seems to work fine until now but there is always a risk that we might miss a bug while merging the patch. As such, we need an ephemeral environment to run E2E on and configure our Prow to start E2E as a pre-submit.

Work items:

[ ] Apply for compute resources (to run pre-submits) in CNCF https://github.com/cncf/cluster/issues/219
[ ] Add a wrapper script to run E2E on ephemeral cluster which would be cleaned up at the end of the each run
[ ] Add presubmit job in Prow
[ ] Update current post submit job to run as periodic (on AWS cluster)

This is a progress tracking issue and I will keep it open until I finish all the necessary steps.

Sep 28 '22 09:09 fmuyassarov

/assign

Sep 28 '22 09:09 fmuyassarov

This would be really cool.

Sep 28 '22 11:09 marquiz

@marquiz do you have access to re-open the issue, because I still have some patches hanging around ?

Nov 02 '22 18:11 fmuyassarov

/reopen

Nov 02 '22 18:11 marquiz

@marquiz: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Nov 02 '22 18:11 k8s-ci-robot

thank you, didn't know that Prow has this command.

Nov 02 '22 18:11 fmuyassarov

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Feb 01 '23 08:02 k8s-triage-robot

/remove-lifecycle stale

Feb 01 '23 12:02 fmuyassarov

Some updates on this:

We have received Equinix metal servers from CNCF. I've created a server already there with Ubuntu 22.04 and installing all the requires bits to be able to run e2e tests.
I inspected the server for finding features that is has and based on that I'm going to update https://github.com/kubernetes/test-infra/pull/27755 PR with the corresponding labels that we expect to have on the node
I found using a single local container registry problematic in a sense that it is hard to share it among many kind clusters in parallel and as such using minikube with its in-cluster docker seemed better approach to run multiple k8s clusters where we built images separately and they don't overlap with each other. They are isolated and lifetime depends on the lifetime of the cluster they are in. PR is ready here: https://github.com/kubernetes-sigs/node-feature-discovery/pull/1088

For e2e pre-submit Prow job, I would suggest to use always_run: false to only trigger it when requested unlike others tests which get triggered automatically. Because, we don't have unlimited resources. Reviewers/PR authors would trigger the job when the PR has gone through reviews and they feel it is good to be merged. It will still be required to get it passed for the tide to merge the PR. But that doesn't block anyone from triggering the job anytime they want.

Mar 15 '23 12:03 fmuyassarov

/lifecycle active

Mar 15 '23 12:03 fmuyassarov

@marquiz maybe a good candidate for V0.14?

Apr 18 '23 10:04 ArangoGutierrez

Yeah, sure we want this as soon as possible. Nothing to promote for the end users though 😎

Apr 18 '23 12:04 marquiz

node-feature-discovery node-feature-discovery copied to clipboard

E2E infra migration

node-feature-discovery
node-feature-discovery copied to clipboard