node-feature-discovery
node-feature-discovery copied to clipboard
E2E infra migration
What would you like to be added: Run E2E tests on dedicated baremetal machines as a presubmit job for every future PR. Why is this needed: Current infra (an AWS cluster) doesn't allow us to run more than one instance of E2E test in parallel because they all would need to run on the same K8S cluster, which potentially might break tests due to the overlap. Because of that we only have a post submit E2E job, that we run after a patch lands on the repository. In practice this seems to work fine until now but there is always a risk that we might miss a bug while merging the patch. As such, we need an ephemeral environment to run E2E on and configure our Prow to start E2E as a pre-submit.
Work items:
- [ ] Apply for compute resources (to run pre-submits) in CNCF https://github.com/cncf/cluster/issues/219
- [ ] Add a wrapper script to run E2E on ephemeral cluster which would be cleaned up at the end of the each run
- [ ] Add presubmit job in Prow
- [ ] Update current post submit job to run as periodic (on AWS cluster)
This is a progress tracking issue and I will keep it open until I finish all the necessary steps.
/assign
This would be really cool.
@marquiz do you have access to re-open the issue, because I still have some patches hanging around ?
/reopen
@marquiz: Reopened this issue.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
thank you, didn't know that Prow has this command.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Some updates on this:
- We have received Equinix metal servers from CNCF. I've created a server already there with Ubuntu 22.04 and installing all the requires bits to be able to run e2e tests.
- I inspected the server for finding features that is has and based on that I'm going to update https://github.com/kubernetes/test-infra/pull/27755 PR with the corresponding labels that we expect to have on the node
- I found using a single local container registry problematic in a sense that it is hard to share it among many kind clusters in parallel and as such using minikube with its in-cluster docker seemed better approach to run multiple k8s clusters where we built images separately and they don't overlap with each other. They are isolated and lifetime depends on the lifetime of the cluster they are in. PR is ready here: https://github.com/kubernetes-sigs/node-feature-discovery/pull/1088
For e2e pre-submit Prow job, I would suggest to use always_run: false to only trigger it when requested unlike others tests which get triggered automatically. Because, we don't have unlimited resources. Reviewers/PR authors would trigger the job when the PR has gone through reviews and they feel it is good to be merged. It will still be required to get it passed for the tide to merge the PR. But that doesn't block anyone from triggering the job anytime they want.
/lifecycle active
@marquiz maybe a good candidate for V0.14?
Yeah, sure we want this as soon as possible. Nothing to promote for the end users though 😎