karmada icon indicating copy to clipboard operation
karmada copied to clipboard

feature: unjoin a cluster gracefully

Open Garrybest opened this issue 2 years ago • 20 comments

What would you like to be added: Now when using karmadactl unjoin, we delete the cluster directly. The scheduling result in resource binding still remains. I would like to make unjoin more gracefully. The procedure could refer to what kubectl drain does.

  1. Cordon the target cluster.
  2. Delete the scheduling result in RB/CRB which has been scheduled to the target cluster.
  3. Delete the cluster.

Why is this needed: Make unjoin a cluster more gracefully.

Garrybest avatar Jun 13 '22 09:06 Garrybest

/cc @lonelyCZ

Garrybest avatar Jun 13 '22 09:06 Garrybest

Related to #1762.

Garrybest avatar Jun 13 '22 09:06 Garrybest

Delete the scheduling result in RB/CRB which has been scheduled to the target cluster.

Should we execute all these actions in cli tool? Or is there a controller to clear the scheduling result in RB/CRB?

lonelyCZ avatar Jun 13 '22 09:06 lonelyCZ

I think so, or we may add a NoExecute taint in this cluster and wait for the eviction. Which one do you prefer?

FYI, in k/k, kubectl drain delete pods in CLI tool.

Garrybest avatar Jun 13 '22 10:06 Garrybest

FYI, in k/k, kubectl drain delete pods in CLI tool.

Looks good, I am going to research it.

lonelyCZ avatar Jun 13 '22 14:06 lonelyCZ

@lonelyCZ @Garrybest Can I pick up it ?

carlory avatar Jun 13 '22 16:06 carlory

@carlory Sure, thanks for your interest to it., you can /assign to youself. Look forward to your contribution!

lonelyCZ avatar Jun 14 '22 01:06 lonelyCZ

/help /assign @carlory

Thanks, glad to hear that. FYI, https://github.com/kubernetes/kubernetes/blob/050f930f8968874855eb215f0c0f0877bcdaa0e8/staging/src/k8s.io/kubectl/pkg/cmd/drain/drain.go#L293-L334

Garrybest avatar Jun 14 '22 02:06 Garrybest

@Garrybest: This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

/help /assign @carlory

Thanks, glad to hear that. FYI, https://github.com/kubernetes/kubernetes/blob/050f930f8968874855eb215f0c0f0877bcdaa0e8/staging/src/k8s.io/kubectl/pkg/cmd/drain/drain.go#L293-L334

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

karmada-bot avatar Jun 14 '22 02:06 karmada-bot

Hi @carlory, any progress here?

Garrybest avatar Jul 19 '22 02:07 Garrybest

So sorry, I forget it.

I will do it this week, is still ok ?

@Garrybest

carlory avatar Jul 19 '22 02:07 carlory

Thanks @carlory, take your time😄

Garrybest avatar Jul 19 '22 03:07 Garrybest

This is another scenario of eviction. Do you know where can I find the PDB/evict-sub-resource thing? share a link? @Garrybest

RainbowMango avatar Jul 19 '22 08:07 RainbowMango

Eviction is a subresource of Pod.

https://kubernetes.io/docs/concepts/workloads/pods/disruptions/#pod-disruption-budgets

Cluster managers and hosting providers should use tools which respect PodDisruptionBudgets by calling the Eviction API instead of directly deleting pods or deployments.

Garrybest avatar Jul 19 '22 08:07 Garrybest

Is there a solution for this issue?

I'm thinking if it's the responsibility of the karmada-scheduler to remove the legacy schedule results?

RainbowMango avatar Jul 25 '22 11:07 RainbowMango

I don't think it's scheduler's duty. karmadactl unjoin is like what kubectl drain does:

  1. Cordon the target cluster.
  2. Delete the scheduling result in RB/CRB which has been scheduled to the target cluster.
  3. Delete the cluster.

Garrybest avatar Jul 26 '22 02:07 Garrybest

Do you mean you want CLI karmadactl unjoin to do the cleanup work?

I suppose CLI is a kind of one-off job, it can't guarantees to finish all the cleanup work in a single request, and due to probably huge amount of RB/CRB, it might increase the time that user don't need to care.

So, my opinion is to put the cleanup job to the controller-manager(maybe cluster-controller?) (ignore scheduler, let it focus on schedule).

RainbowMango avatar Jul 26 '22 02:07 RainbowMango

I have talked with @RainbowMango privately. We now think it's better to do the cleanup work in the cluster_controller here before the finalizer is removed instead of in CLI tools.

Now the plan is changed. Hi @carlory, would you still like to make a contribution here? It's up to you.

Garrybest avatar Jul 27 '22 06:07 Garrybest

I am willing to try, but I need some time to learn how to implement it.

carlory avatar Jul 28 '22 07:07 carlory

fix in #2373

Garrybest avatar Aug 13 '22 14:08 Garrybest

/assign @Garrybest

RainbowMango avatar Aug 15 '22 01:08 RainbowMango