karmada icon indicating copy to clipboard operation
karmada copied to clipboard

Call shutdown after being stopped

Open whitewindmills opened this issue 9 months ago • 9 comments

What type of PR is this? /kind cleanup /kind bug

What this PR does / why we need it: we forget to call shutdown to close workqueue in these two components.

  • karmada-scheduler
  • karmada-metrics-adapter

Which issue(s) this PR fixes: Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

whitewindmills avatar May 08 '24 06:05 whitewindmills

Codecov Report

Attention: Patch coverage is 0% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 53.19%. Comparing base (81b8c4c) to head (0708b7d). Report is 18 commits behind head on master.

Files Patch % Lines
pkg/scheduler/scheduler.go 0.00% 1 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4916      +/-   ##
==========================================
+ Coverage   53.12%   53.19%   +0.06%     
==========================================
  Files         251      252       +1     
  Lines       20415    20509      +94     
==========================================
+ Hits        10846    10910      +64     
- Misses       8855     8880      +25     
- Partials      714      719       +5     
Flag Coverage Δ
unittests 53.19% <0.00%> (+0.06%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar May 08 '24 07:05 codecov-commenter

/cc @RainbowMango @XiShanYongYe-Chang to take a look

whitewindmills avatar May 09 '24 08:05 whitewindmills

@whitewindmills: GitHub didn't allow me to request PR reviews from the following users: to, take, a, look.

Note that only karmada-io members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @RainbowMango @XiShanYongYe-Chang to take a look

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

karmada-bot avatar May 09 '24 08:05 karmada-bot

Thanks /assign

XiShanYongYe-Chang avatar May 09 '24 08:05 XiShanYongYe-Chang

Just a little question, if the queue is not shut down, what is the impact? Memory leak?

ShutDown will cause workqueue to ignore all new items added to it and immediately instruct the worker goroutines(include metrics goroutine) to exit. it is used to exit the currently running Goroutines gracefully. this is essential for managing resource cleanup and preventing memory leaks.

whitewindmills avatar May 10 '24 02:05 whitewindmills

Thanks a lot~ /lgtm /cc @chaunceyjiang @RainbowMango

Do we need the release note and cherry pick this patch to the previous release?

XiShanYongYe-Chang avatar May 10 '24 02:05 XiShanYongYe-Chang

ping @RainbowMango @chaunceyjiang

whitewindmills avatar May 11 '24 01:05 whitewindmills

Do we need the release note and cherry pick this patch to the previous release?

+1

whitewindmills avatar May 11 '24 01:05 whitewindmills

/assign @RainbowMango

whitewindmills avatar May 14 '24 06:05 whitewindmills

ShutDown will cause workqueue to ignore all new items added to it and immediately instruct the worker goroutines(include metrics goroutine) to exit. it is used to exit the currently running Goroutines gracefully. this is essential for managing resource cleanup and preventing memory leaks.

When the controller is going to stop(close the stopCh), all derived go routines will be closed anyway, not in a graceful way. But does it cause memory leaks?

Another questions, do we need to stop the others, like InformerFactory, TypedInformerManager? In https://github.com/karmada-io/karmada/blob/6f29134f1e3de90029e1e020dfab01e5df6d49cf/pkg/metricsadapter/controller.go#L53-L52

RainbowMango avatar May 14 '24 08:05 RainbowMango

But does it cause memory leaks?

not absolutely, it depends on what is done in this goroutine. but we'd better follow good practices.

do we need to stop the others, like InformerFactory, TypedInformerManager?

  • for InformerFactory, judging from its implementation, it seems that calling the Shutdown function does not make sense. https://github.com/karmada-io/karmada/blob/6f29134f1e3de90029e1e020dfab01e5df6d49cf/pkg/generated/informers/externalversions/factory.go#L146-L152

  • for TypedInformerManager, even if the StopInstance function is called, https://github.com/karmada-io/karmada/blob/6f29134f1e3de90029e1e020dfab01e5df6d49cf/pkg/util/fedinformer/typedmanager/multi-cluster-manager.go#L30-L58 it will not take effect because the controller does not use the internal stopCh channel at all during initialization. this seems to be illegal. maybe we need other PR to modify this part of the logic. https://github.com/karmada-io/karmada/blob/6f29134f1e3de90029e1e020dfab01e5df6d49cf/pkg/metricsadapter/controller.go#L71 https://github.com/karmada-io/karmada/blob/6f29134f1e3de90029e1e020dfab01e5df6d49cf/pkg/metricsadapter/controller.go#L134-L140 context.TODO().Done() is used as a stop semaphore.

whitewindmills avatar May 14 '24 08:05 whitewindmills

not absolutely, it depends on what is done in this goroutine. but we'd better follow good practices.

Yeah, Gracefully shutting down is good. I just want to clarify the impact as well as the benefit, the decision regarding to backport or not is based on the analysis. In my opinion, there won't be any memory leak issue here, so no need to backport this. One of the benefits I can get is that, for the informer, a graceful shutdown could close the connection between the target Kubernetes API server in time, there isn't a need to wait time out for the apiserver side to close the connection.

RainbowMango avatar May 14 '24 09:05 RainbowMango

One of the benefits I can get is that, for the informer, a graceful shutdown could close the connection between the target Kubernetes API server in time, there isn't a need to wait time out for the apiserver side to close the connection.

I also get it.

whitewindmills avatar May 14 '24 09:05 whitewindmills

it will not take effect because the controller does not use the internal stopCh channel at all during initialization. this seems to be illegal. maybe we need other PR to modify this part of the logic.

Just quickly looked at the code, yes, the usage of Stopch and context seems casual, welcome to optimize it a little bit.

RainbowMango avatar May 14 '24 09:05 RainbowMango

ping @RainbowMango

whitewindmills avatar May 15 '24 01:05 whitewindmills

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

karmada-bot avatar May 15 '24 08:05 karmada-bot