controller-runtime icon indicating copy to clipboard operation
controller-runtime copied to clipboard

🌱 Implement missing client methods for komega interface

Open Danil-Grigorev opened this issue 2 years ago • 9 comments

This PR adds some missing interface members for komega, making it more suitable for general use in tests.

Danil-Grigorev avatar Sep 05 '23 18:09 Danil-Grigorev

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Danil-Grigorev Once this PR has been reviewed and has the lgtm label, please assign fillzpp for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Sep 05 '23 18:09 k8s-ci-robot

@Danil-Grigorev: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-controller-runtime-apidiff 721fc30f45798eb86b688ec2504e02f0ebc00ba5 link false /test pull-controller-runtime-apidiff

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot avatar Sep 05 '23 18:09 k8s-ci-robot

/assign @schrej

sbueringer avatar Sep 21 '23 18:09 sbueringer

The idea behind these functions was to help with handling eventual consistency in the apiserver. After an object is created, it can take multiple GET calls until it's actually returned. In addition, when you test controllers, it can take a while until some modification is performed. When running actions like create or delete it's usually not necessary to retry them, since they are very likely to be successful in a test setup. Instead, you're much more likely to wait until a deleted object is actually gone, which can be done using Eventually(Get()).Should(NotSucceed()). For the same reason there are no Create functions. Update is a little out of line - I think the main motivation was the Get part in the beginning of it.

If you have a good use-case that requires these methods, we can add them. In that case I'd go all the way though, and also add a method to create objects (and add package level functions for all of them).

schrej avatar Sep 25 '23 22:09 schrej

I think komega in our case is used to wait for some resource to reach expected state during e2e run. Our setup requires a controller to be started, which in turn will create a resource, and then we want to delete it. So something like

By("Removing cluster record, and waiting for it to be removed")
Eventually(func() error {
	return bootstrapClusterProxy.GetClient().Delete(ctx, cluster)
}, e2eConfig.GetIntervals(bootstrapClusterProxy.GetName(), "wait-controllers")...).Should(Succeed())
Eventually(Get(rancherCluster), e2eConfig.GetIntervals(bootstrapClusterProxy.GetName(), "wait-controllers")...).Should(MatchError(ContainSubstring(…..

could be simplified to

By("Removin cluster record, and waiting for it to be removed")
Eventually(Delete(cluster), e2eConfig.GetIntervals(bootstrapClusterProxy.GetName(), "wait-controllers")...).Should(Succeed())
Eventually(Get(cluster), e2eConfig.GetIntervals(bootstrapClusterProxy.GetName(), "wait-controllers")...).Should(MatchError(ContainSubstring( …..

which improves readability without sacrificing functionality. It additionally helps to rule out some transient errors, like 408 from client opened to a distant and resource constraint k8s cluster, which in turn may mark a long running e2e task as failed. This scenario could be handled with an additional get call above, but I’d prefer to have options.

As for create scenario, I agree, this should be added, if you think this is a sufficient use-case to get this in. Could help in situations when a webhook configuration is slow to come up in a freshly deployed controller instance, and we immediately want to apply some custom resource it should process.

Danil-Grigorev avatar Sep 26 '23 22:09 Danil-Grigorev

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Nov 22 '23 18:11 k8s-ci-robot

Sorry, I lost track of this PR. That's a fair point. My question would be: why not wait until the object exists before trying to delete it? So instead of your example, it would look like this:

By("Removin cluster record, and waiting for it to be removed")
Eventually(Get(cluster), e2eConfig.GetIntervals(bootstrapClusterProxy.GetName(), "wait-controllers")...).Should(Succeed))
Expect(Delete(cluster), e2eConfig.GetIntervals(bootstrapClusterProxy.GetName(), "wait-controllers")...).To(Succeed())
Eventually(Get(cluster), e2eConfig.GetIntervals(bootstrapClusterProxy.GetName(), "wait-controllers")...).Should(MatchError(ContainSubstring( …..

Don't get me wrong, I'm not fundamentally against adding more functions, I just don't see the point of waiting for deletion or creation to succeed. In a working test setup I'd just expect those calls to work. If they don't work every time, the test setup is broken. For calls like Get and the other helpers in this package, it's expected that they do not succeed immediately, even if the test setup is working perfectly fine.

The example above also makes the output of tests much clearer in my opinion. If the controller fails to create the resource you end up with a timeout on fetching the object, indicating that it never existed. With your approach you'd get a timeout on deleting the object, but it's not clear if it's due to an api server error, or because it never existed in the first place.

schrej avatar Nov 23 '23 22:11 schrej

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 21 '24 22:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Mar 22 '24 23:03 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-triage-robot avatar Apr 22 '24 00:04 k8s-triage-robot

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 22 '24 00:04 k8s-ci-robot