karmada RB may not be created when delete and create the same object immediately

What happened:

Create a deployment and wait it until running.

# kubectl create -f samples/nginx/

# kubectl get deployments.apps 
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   2/2     2            2           3m4s

Delete this deployment and create it again immediately, we could use kubectl replace

# kubectl replace --force -f samples/nginx/deployment.yaml                         
deployment.apps "nginx" deleted
deployment.apps/nginx replaced

RB will not be created

# kubectl get rb                    
NAME            SCHEDULED   FULLYAPPLIED   AGE

What you expected to happen: Recreate RB.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Procedure:

We delete deployment.
RB tends to be GC but not yet deleted.
We create deployment, controller could just update the former RB.
RB is deleted by GC.
The controller could not create RB because no more events received.

I think we could watch the deletion event for RB and enqueue the object into detector.

/cc @XiShanYongYe-Chang @pigletfly @RainbowMango

Environment:

Karmada version:
kubectl-karmada or karmadactl version (the result of kubectl-karmada version or karmadactl version):
Others:

Jun 29 '22 12:06 Garrybest

Interesting.

Jun 29 '22 12:06 RainbowMango

How about adding UID to binding name, so that we are able to distinguish bindings for each "object", but it could break users who are relying on the naming rule name-kind https://github.com/karmada-io/karmada/blob/cee1a2f4a16b5292b61007ce3f87e887c1e8acc2/pkg/detector/detector.go#L665-L666

Just an option

BTW I'm thinking name-kind is technically not enough to identify an object, especially for different groups, for example:

apiVersion: gateway.networking.k8s.io/v1alpha2
kind: Gateway

and

apiVersion: networking.istio.io/v1beta1
kind: Gateway

Jun 30 '22 01:06 dddddai

It makes sense. But I'm afraid it's not friendly for uses when the binding name is too long or the name has implicit connection with original object.

Maybe we could hash the uid to make it shorter?

Jun 30 '22 02:06 Garrybest

I try serveral times. Sometimes RB can update normally, sometimes it do happen the phenomennon. Let's talk about the normal situation. When we delete deployment, RB tend to be GC but not yet deleted basiclly because RB is ownerReferenced by the deployment. We create deployment, controller could just update the former RB. Actually If the updated was GC before，the ownerReference in RB has changed to the deployment which is newly created. Now GC won't treated this updated RB as needing to be removed. So RB is updated normally, and will not be deleted. I don't think we must recreate in this situation, but must guarantee the eventual consistency.

But Sometimes, GC is before RB updated. So now we need to recreate RB. From the log, I guess the process of creating a RB happens but it doesn't success.

E0630 03:48:43.910451       1 controller.go:317] "controller/resourcebinding: Reconciler error" err="Operation cannot be fulfilled on resourcebindings.work.karmada.io \"nginx-deployment\": the object has been modified; please apply your changes to the latest version and try again" reconciler group="work.karmada.io" reconciler kind="ResourceBinding" name="nginx-deployment" namespace="default"

Jun 30 '22 03:06 Poor12

Actually If the updated was GC before，the ownerReference in RB has changed to the deployment which is newly created. Now GC won't treated this updated RB as needing to be removed. So RB is updated normally, and will not be deleted. I don't think we must recreate in this situation, but must guarantee the eventual consistency.

I think we should recreate RB instead of updating the ownerReference. If so, the recreation would have no effect. Users want to delete their former deployment and create a new one, imagine we delete the deployment and create a new one, all pods will be recreated. So in terms of this intention, I prefer to delete the obsolete RB and create a new one.

Jun 30 '22 03:06 Garrybest

I was wondering why it wasn't created and what was the root case. I haven't figured it out yet.

Jun 30 '22 12:06 XiShanYongYe-Chang

Delete the deployment.
RB is waiting for GC.
Create this deployment again immediately.
Controller watches the creation event and tries to create or update RB, since RB has not been deleted by garbage collector yet, the controller only updates the RB.
RB is deleted by garbage collector.
Now the controller will not receive any other events for the deployment, so RB would not be created until controller is restarted.

Jun 30 '22 14:06 Garrybest

You could have a try by using kubectl replace --force -f samples/nginx/deployment.yaml after the deployment is running. And if we restart karmada-controller-manager, the RB could be created.

Jun 30 '22 14:06 Garrybest

@Garrybest Thanks, I figure it out. I was wondering if we could change the logic for updating resourceBinding to stop updating the old resourceBinding when it has been marked for deletion and return an error.

Jul 01 '22 01:07 XiShanYongYe-Chang

I think it could works. The controller will retry, but we should be aware that the error would call an ItemExponentialFailure requeue.

But I also think https://github.com/karmada-io/karmada/issues/2090#issuecomment-1170660576 is another solution to solve all the relevant issues. The main cause is we garble two different objects with the same ObjectReference.

Jul 01 '22 03:07 Garrybest

But I also think #2090 (comment) is another solution to solve all the relevant issues. The main cause is we garble two different objects with the same ObjectReference.

Yeah I believe we should change the naming rule, I will do this if you all agree :)

Jul 01 '22 09:07 dddddai

Yeah I believe we should change the naming rule, I will do this if you all agree :)

Sure. #2099 record this problem. We may also need to consider upgrading when looking solutions.

Jul 01 '22 09:07 XiShanYongYe-Chang

Yeah I believe we should change the naming rule, I will do this if you all agree :)

Hi @dddddai, would you like to deal with this issue?

Jul 12 '22 09:07 Garrybest

Fix this issue or update the naming rule? As we discussed at the last meeting, if we are going to change the naming rule, we need a compatible solution.

Jul 12 '22 09:07 RainbowMango

Hi @dddddai, would you like to deal with this issue?

Yes but it's tricky to handle the compatibility

Fix this issue or update the naming rule? As we discussed at the last meeting, if we are going to change the naming rule, we need a compatible solution.

Sorry I didn't attend the meeting, what's your solution?

I'm thinking we can add a temporary logic for upgrading, here's the general idea

Jul 12 '22 13:07 dddddai

Can we just requeue the key if we can get a resourcebinding that matched the pp label and the deletetimestamp is not zero ?

Jul 23 '22 11:07 wuyingjun-lucky

Our env encounter this bug（using client-go delete and then create job） and I temporary fix it by requeue the event if exists a rb and the deletetimestamp is not zero. @Garrybest @dddddai @RainbowMango Can we modify like this ?

Jul 24 '22 12:07 wuyingjun-lucky

Hi @wuyingjun-lucky, I think we should requeue the event after rb is deleted. So do you requeue the event before deletion? Cause you say if exists a rb and the deletetimestamp is not zero.

Jul 25 '22 06:07 Garrybest

I think it could works. The controller will retry, but we should be aware that the error would call an ItemExponentialFailure requeue.

But I also think #2090 (comment) is another solution to solve all the relevant issues. The main cause is we garble two different objects with the same ObjectReference.

Yes, the controller will retry and delay the updation for rb. I just temporary fix it by this because our env encounter this bug

Jul 25 '22 06:07 wuyingjun-lucky

It seems like a temporary solution, not a best one. Because when we requeue the object, it will call an ItemExponentialFailure requeue.

Jul 25 '22 07:07 Garrybest

I think we should recreate RB instead of updating the ownerReference. If so, the recreation would have no effect. Users want to delete their former deployment and create a new one, imagine we delete the deployment and create a new one, all pods will be recreated. So in terms of this intention, I prefer to delete the obsolete RB and create a new one.

Why do people delete and create same object immediately? What's behavior he/she expect for? Do you mean he/she expects to recreate the Pod? If the no changes to the specification of the Pod, why expect to recreate it? If the specification changes, no matter if we re-create the binding or update the binding, the Pod would re-create eventually, right?

Jul 26 '22 13:07 RainbowMango

Let me give a example.

A user creates a deployment.
He finds something is wrong, e.g., image tag is wrong.
He does not want to roll update the deployment, so he modify the deployment yaml and recreate it by using kubectl replace -f --force .
Now he has encounted this issue.

Jul 27 '22 02:07 Garrybest

Thanks for the clarification, I get it.

diff --git a/pkg/detector/detector.go b/pkg/detector/detector.go                                                          
index be0dc958..7a17e6b7 100644                                                                                           
--- a/pkg/detector/detector.go                                                                                            
+++ b/pkg/detector/detector.go                                                                                            
@@ -451,6 +451,9 @@ func (d *ResourceDetector) ApplyPolicy(object *unstructured.Unstructured, object                      
        var operationResult controllerutil.OperationResult                                                                
        err = retry.RetryOnConflict(retry.DefaultRetry, func() (err error) {                                              
                operationResult, err = controllerutil.CreateOrUpdate(context.TODO(), d.Client, bindingCopy, func() error {
+                       if bindingCopy.DeletionTimestamp != nil {                                                         
+                               return fmt.Errorf("do not update during deleting/gc")
+                       }
                        // Just update necessary fields, especially avoid modifying Spec.Clusters which is scheduling result, if already exists.
                        bindingCopy.Labels = util.DedupeAndMergeLabels(bindingCopy.Labels, binding.Labels)

Is it a workable solution? (A little bit ugly though)

Maybe we could hash the uid to make it shorter?

Actually, I like this approach, but it's pretty hard to keep compatible. will look at the solution @dddddai mentioned on above https://github.com/karmada-io/karmada/issues/2090#issuecomment-1181791827.

Jul 27 '22 03:07 RainbowMango

Right, it could works temporarily like https://github.com/karmada-io/karmada/issues/2090#issuecomment-1193306423 said. We could make a quick fix here and then wait for long-term solution.

Jul 27 '22 06:07 Garrybest

+1 @dddddai How do you say? Your idea(as mentioned at https://github.com/karmada-io/karmada/issues/2090#issuecomment-1181791827) is probably the long-term solution.

Jul 27 '22 06:07 RainbowMango

I'm ok with the quick fix

So for long-term solution we are going to change the naming rule to name-kind-hash, right? If so I will try implementing my idea and see if it works

Jul 27 '22 07:07 dddddai

So for long-term solution we are going to change the naming rule to name-kind-hash, right?

How about let's have a talk at the meeting next week.

Jul 29 '22 03:07 RainbowMango

Sure

Jul 29 '22 07:07 dddddai

anything new about this?

Oct 27 '22 11:10 jwcesign

If we are gonna change the naming rule and the idea makes sense, I will work on it. @RainbowMango WDYT

Oct 27 '22 11:10 dddddai

karmada karmada copied to clipboard

RB may not be created when delete and create the same object immediately

karmada
karmada copied to clipboard