cluster-api
cluster-api copied to clipboard
TestReconcileUpdateObservedGeneration is flaky
looking at https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-pr-test-main&width=20 is seems that TestReconcileUpdateObservedGeneration is really flaky
Two types of errors can be seen in the job
controller_test.go:162:
Unexpected error:
<*errors.StatusError | 0xc0007fe3c0>: {
ErrStatus: {
TypeMeta: {Kind: "", APIVersion: ""},
ListMeta: {
SelfLink: "",
ResourceVersion: "",
Continue: "",
RemainingItemCount: nil,
},
Status: "Failure",
Message: "KubeadmControlPlane.controlplane.cluster.x-k8s.io \"kcp-foo-z7deww\" not found",
Reason: "NotFound",
Details: {
Name: "kcp-foo-z7deww",
Group: "controlplane.cluster.x-k8s.io",
Kind: "KubeadmControlPlane",
UID: "",
Causes: nil,
RetryAfterSeconds: 0,
},
Code: 404,
},
}
KubeadmControlPlane.controlplane.cluster.x-k8s.io "kcp-foo-z7deww" not found
occurred
and
controller_test.go:178:
Timed out after 10.001s.
Expected
<int64>: 0
to equal
<int64>: 1
/kind failing-test /kind flake @sbueringer @davideimola
Hi @fabriziopandini, I think the first issue is caused by those lines
https://github.com/kubernetes-sigs/cluster-api/blob/e4ae2abb15c26e7e1d59dd90cbd050e6be007552/controlplane/kubeadm/controllers/controller_test.go#L157-L163
After the creation, we are not reconciling (I think it is required), and maybe we can add the generation reading inside the Eventually.
The other issue maybe can be fixed by increasing the timeout value, but I am not so sure about that...
/assign /lifecycle active
the fix should be worked in testenv/controller runtime (use the live client, fix kubernetes/kubernetes#80609), so I'm temporarily disabling the test to unblock PRs
/assign /lifecycle active
/milestone v0.4
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/lifecycle frozen
/close It seems the test is stable lately
@fabriziopandini: Closing this issue.
In response to this:
/close It seems the test is stable lately
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/reopen
The test is stable because we skipped it: https://github.com/kubernetes-sigs/cluster-api/blob/f73a27729acf3bc2cce2479220e1460eb6f73efb/controlplane/kubeadm/internal/controllers/controller_test.go#L159
Fine for me to close the issue anyway if we decide we don't want to track the skipped test. (just don't want to close it because it's stable when it's actually skipped :))
@sbueringer: Reopened this issue.
In response to this:
/reopen
The test is stable because we skipped it: https://github.com/kubernetes-sigs/cluster-api/blob/f73a27729acf3bc2cce2479220e1460eb6f73efb/controlplane/kubeadm/internal/controllers/controller_test.go#L159
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/triage accepted
(doing some cleanup on old issues without updates) /close
@fabriziopandini: Closing this issue.
In response to this:
(doing some cleanup on old issues without updates) /close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.