karmada icon indicating copy to clipboard operation
karmada copied to clipboard

Migrate deprecated wait.Poll function

Open Affan-7 opened this issue 1 year ago • 13 comments

What type of PR is this? /kind cleanup

What this PR does / why we need it: This PR replaces the deprecated wait.Poll() function with the wait.PollUntilContextTimeout() function.

Which issue(s) this PR fixes: Related to #3835 1 out of 4 tasks

Does this PR introduce a user-facing change?:

NONE

Affan-7 avatar Aug 07 '23 10:08 Affan-7

Welcome @Affan-7! It looks like this is your first PR to karmada-io/karmada 🎉

karmada-bot avatar Aug 07 '23 10:08 karmada-bot

/lgtm

/assign @rainbowmango

for workflow and approval.

carlory avatar Aug 07 '23 10:08 carlory

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 53.33%. Comparing base (d4c2793) to head (197d335).

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3906   +/-   ##
=======================================
  Coverage   53.33%   53.33%           
=======================================
  Files         252      252           
  Lines       20482    20482           
=======================================
  Hits        10924    10924           
  Misses       8836     8836           
  Partials      722      722           
Flag Coverage Δ
unittests 53.33% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Aug 07 '23 11:08 codecov-commenter

The tests were failing, here are the logs: https://github.com/karmada-io/karmada/actions/runs/5783869909/job/15676511801#step:5:2635

Should we increase the default value of j.Wait? Or there is some other problem?

I can't run e2e tests locally, because they are too slow on my computer.

Affan-7 avatar Aug 07 '23 16:08 Affan-7

It might be flak, let's give it another try.

RainbowMango avatar Aug 08 '23 01:08 RainbowMango

Should we increase the default value of j.Wait? Or there is some other problem?

I'm not sure yet. let's give it another try and try to find the root cause.

I'm curious that the test failed in 42 seconds, but the default waiting time(j.Wait) is 60 seconds.

Just echo the log here: https://github.com/karmada-io/karmada/actions/runs/5787607401/job/15697196471?pr=3906#step:5:2723

[FAILED] [42.440 seconds]
[cluster joined] reschedule testing Deployment propagation testing [AfterEach] testing clusterAffinity of the policy when the ReplicaScheduling of the policy is nil, reschedule testing [needCreateCluster]
  [AfterEach] /home/runner/work/karmada/karmada/test/e2e/rescheduling_test.go:176
  [It] /home/runner/work/karmada/karmada/test/e2e/rescheduling_test.go:271

  Captured StdOut/StdErr Output >>
  I0808 02:23:49.855080   46074 deployment.go:199] The ResourceBinding(karmadatest-2dj87/deploy-tjnp2-deployment) schedule result is: member1,member2
  cluster(member-e2e-lhr) is joined successfully
  I0808 02:23:59.098817   46074 deployment.go:199] The ResourceBinding(karmadatest-2dj87/deploy-tjnp2-deployment) schedule result is: member-e2e-lhr,member1,member2
  I0808 02:23:59.109167   46074 deployment.go:199] The ResourceBinding(karmadatest-2dj87/deploy-tjnp2-deployment) schedule result is: member-e2e-lhr,member1,member2
  E0808 02:23:59.135302   46074 unjoin.go:239] Failed to delete cluster object. cluster name: member-e2e-lhr, error: context deadline exceeded
  E0808 02:23:59.136891   46074 unjoin.go:174] Failed to delete cluster object. cluster name: member-e2e-lhr, error: context deadline exceeded

RainbowMango avatar Aug 08 '23 07:08 RainbowMango

/remove-approve We need to figure out the root cause, probably affected by https://github.com/kubernetes/kubernetes/issues/119533

RainbowMango avatar Aug 10 '23 01:08 RainbowMango

Here is what I did:

Summary:

Value of j.Wait was zero in the both cases wait.Poll and wait.PollUntilContextTimeout(). Since the wait.Poll() is deprecated it no longer returns any errors. That's why wait.Poll() was passing the test and wait.PollUntilContextTimeout() was failing. I specified the value of j.Wait explicitly under the test file and everything works fine as expected.


Longer explanation:

I added fmt.Printf("+++++ value of j.Wait: %v\n", j.Wait) before wait.PollUntilContextTimeout(context.TODO(), 1*time.Second, j.Wait, .....). And I ran the test on the my fork of Karmada, using github actions. I was surprised that the value of j.Wait was zero.

Reference: https://github.com/Affan-7/karmada/actions/runs/5841035785/job/15840850551#step:5:2798

So I run the test again with the deprecated wait.Poll() method. But the test passed! Then I ran it again with fmt.Printf("+++++ value of j.Wait: %v\n", j.Wait) and the value of j.Wait was also zero.

Reference: https://github.com/Affan-7/karmada/actions/runs/5841648595/job/15842010633#step:5:2999

The value of j.Wait was still zero but the test case weren't failing. Because the wait.Poll() method is deprecated and doesn't return errors anymore. I found this in the documentation of wait.Poll.

Deprecated: This method does not return errors from context, use PollWithContextTimeout. Note that the new method will no longer return ErrWaitTimeout and instead return errors defined by the context package. Will be removed in a future release.

Reference: https://pkg.go.dev/k8s.io/apimachinery/pkg/util/wait#Poll

I was assuming that the default value of j.Wait should be 60 because of flags.DurationVar(&j.Wait, "wait", 60*time.Second, ....) under the unjoin.go. But this block of code doesn't get executed during the test (because it's for CLI). Therefore the value of 'j.Wait' remains zero.

So I added the value of j.Wait explicitly under rescheduling_test.go by using Wait: 60 * time.Second,. It's very similar to the Wait: 5 * options.DefaultKarmadactlCommandDuration, on the line 441 of the karmadactl_test.go.

After specifying the value of j.Wait under the rescheduling_test.go all the tests were passing successfully. Reference: https://github.com/Affan-7/karmada/actions/runs/5841642875/job/15841998395#step:5:3461

Affan-7 avatar Aug 12 '23 20:08 Affan-7

Can you please merge this @RainbowMango :blush:. Or is there something that should I improve?

Affan-7 avatar Aug 16 '23 04:08 Affan-7

Thanks @Affan-7 for the excellent analysis, I think it is acceptable to set a timeout here.

I wonder if this is affected by https://github.com/kubernetes/kubernetes/issues/119533?

I'm hesitant to do it before https://github.com/kubernetes/kubernetes/issues/119533.

RainbowMango avatar Aug 16 '23 08:08 RainbowMango

Hi @Affan-7 We can get back on this now as we already updated the Kubernetes dependencies to v1.29+ which includes the fix we want. Please feel free to let me know if you can work on this.

RainbowMango avatar May 06 '24 12:05 RainbowMango

Just rebased this and fixed the golint issues. /assign @XiShanYongYe-Chang

RainbowMango avatar May 22 '24 03:05 RainbowMango

/approve

RainbowMango avatar May 22 '24 07:05 RainbowMango

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

karmada-bot avatar May 22 '24 07:05 karmada-bot