karmada
karmada copied to clipboard
RB may not be created when delete and create the same object immediately
What happened:
- Create a deployment and wait it until running.
# kubectl create -f samples/nginx/
# kubectl get deployments.apps
NAME READY UP-TO-DATE AVAILABLE AGE
nginx 2/2 2 2 3m4s
- Delete this deployment and create it again immediately, we could use
kubectl replace
# kubectl replace --force -f samples/nginx/deployment.yaml
deployment.apps "nginx" deleted
deployment.apps/nginx replaced
- RB will not be created
# kubectl get rb
NAME SCHEDULED FULLYAPPLIED AGE
What you expected to happen: Recreate RB.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Procedure:
- We delete deployment.
- RB tends to be
GC
but not yet deleted. - We create deployment, controller could just update the former RB.
- RB is deleted by
GC
. - The controller could not create RB because no more events received.
I think we could watch the deletion event for RB and enqueue the object into detector.
/cc @XiShanYongYe-Chang @pigletfly @RainbowMango
Environment:
- Karmada version:
- kubectl-karmada or karmadactl version (the result of
kubectl-karmada version
orkarmadactl version
): - Others:
Interesting.
How about adding UID to binding name, so that we are able to distinguish bindings for each "object", but it could break users who are relying on the naming rule name-kind
https://github.com/karmada-io/karmada/blob/cee1a2f4a16b5292b61007ce3f87e887c1e8acc2/pkg/detector/detector.go#L665-L666
Just an option
BTW I'm thinking name-kind
is technically not enough to identify an object, especially for different groups, for example:
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: Gateway
and
apiVersion: networking.istio.io/v1beta1
kind: Gateway
It makes sense. But I'm afraid it's not friendly for uses when the binding name is too long or the name has implicit connection with original object.
Maybe we could hash the uid
to make it shorter?
I try serveral times. Sometimes RB can update normally, sometimes it do happen the phenomennon. Let's talk about the normal situation. When we delete deployment, RB tend to be GC but not yet deleted basiclly because RB is ownerReferenced by the deployment. We create deployment, controller could just update the former RB. Actually If the updated was GC before,the ownerReference in RB has changed to the deployment which is newly created. Now GC won't treated this updated RB as needing to be removed. So RB is updated normally, and will not be deleted. I don't think we must recreate in this situation, but must guarantee the eventual consistency.
But Sometimes, GC is before RB updated. So now we need to recreate RB. From the log, I guess the process of creating a RB happens but it doesn't success.
E0630 03:48:43.910451 1 controller.go:317] "controller/resourcebinding: Reconciler error" err="Operation cannot be fulfilled on resourcebindings.work.karmada.io \"nginx-deployment\": the object has been modified; please apply your changes to the latest version and try again" reconciler group="work.karmada.io" reconciler kind="ResourceBinding" name="nginx-deployment" namespace="default"
Actually If the updated was GC before,the ownerReference in RB has changed to the deployment which is newly created. Now GC won't treated this updated RB as needing to be removed. So RB is updated normally, and will not be deleted. I don't think we must recreate in this situation, but must guarantee the eventual consistency.
I think we should recreate RB instead of updating the ownerReference. If so, the recreation would have no effect. Users want to delete their former deployment and create a new one, imagine we delete the deployment and create a new one, all pods will be recreated. So in terms of this intention, I prefer to delete the obsolete RB and create a new one.
I was wondering why it wasn't created and what was the root case. I haven't figured it out yet.
- Delete the deployment.
- RB is waiting for GC.
- Create this deployment again immediately.
- Controller watches the creation event and tries to create or update RB, since RB has not been deleted by garbage collector yet, the controller only updates the RB.
- RB is deleted by garbage collector.
- Now the controller will not receive any other events for the deployment, so RB would not be created until controller is restarted.
You could have a try by using kubectl replace --force -f samples/nginx/deployment.yaml
after the deployment is running. And if we restart karmada-controller-manager, the RB could be created.
@Garrybest Thanks, I figure it out. I was wondering if we could change the logic for updating resourceBinding to stop updating the old resourceBinding when it has been marked for deletion and return an error.
I think it could works. The controller will retry, but we should be aware that the error would call an ItemExponentialFailure requeue.
But I also think https://github.com/karmada-io/karmada/issues/2090#issuecomment-1170660576 is another solution to solve all the relevant issues. The main cause is we garble two different objects with the same ObjectReference.
But I also think #2090 (comment) is another solution to solve all the relevant issues. The main cause is we garble two different objects with the same ObjectReference.
Yeah I believe we should change the naming rule, I will do this if you all agree :)
Yeah I believe we should change the naming rule, I will do this if you all agree :)
Sure. #2099 record this problem. We may also need to consider upgrading when looking solutions.
Yeah I believe we should change the naming rule, I will do this if you all agree :)
Hi @dddddai, would you like to deal with this issue?
Fix this issue or update the naming rule? As we discussed at the last meeting, if we are going to change the naming rule, we need a compatible solution.
Hi @dddddai, would you like to deal with this issue?
Yes but it's tricky to handle the compatibility
Fix this issue or update the naming rule? As we discussed at the last meeting, if we are going to change the naming rule, we need a compatible solution.
Sorry I didn't attend the meeting, what's your solution?
I'm thinking we can add a temporary logic for upgrading, here's the general idea
Can we just requeue the key if we can get a resourcebinding that matched the pp label and the deletetimestamp is not zero ?
Our env encounter this bug(using client-go delete and then create job) and I temporary fix it by requeue the event if exists a rb and the deletetimestamp is not zero. @Garrybest @dddddai @RainbowMango Can we modify like this ?
Hi @wuyingjun-lucky, I think we should requeue the event after rb is deleted. So do you requeue the event before deletion? Cause you say if exists a rb and the deletetimestamp is not zero
.
I think it could works. The controller will retry, but we should be aware that the error would call an ItemExponentialFailure requeue.
But I also think #2090 (comment) is another solution to solve all the relevant issues. The main cause is we garble two different objects with the same ObjectReference.
Yes, the controller will retry and delay the updation for rb. I just temporary fix it by this because our env encounter this bug
It seems like a temporary solution, not a best one. Because when we requeue the object, it will call an ItemExponentialFailure requeue.
I think we should recreate RB instead of updating the ownerReference. If so, the recreation would have no effect. Users want to delete their former deployment and create a new one, imagine we delete the deployment and create a new one, all pods will be recreated. So in terms of this intention, I prefer to delete the obsolete RB and create a new one.
Why do people delete and create same object immediately? What's behavior he/she expect for? Do you mean he/she expects to recreate the Pod? If the no changes to the specification of the Pod, why expect to recreate it? If the specification changes, no matter if we re-create the binding or update the binding, the Pod would re-create eventually, right?
Let me give a example.
- A user creates a deployment.
- He finds something is wrong, e.g., image tag is wrong.
- He does not want to roll update the deployment, so he modify the deployment yaml and recreate it by using
kubectl replace -f --force .
- Now he has encounted this issue.
Thanks for the clarification, I get it.
diff --git a/pkg/detector/detector.go b/pkg/detector/detector.go
index be0dc958..7a17e6b7 100644
--- a/pkg/detector/detector.go
+++ b/pkg/detector/detector.go
@@ -451,6 +451,9 @@ func (d *ResourceDetector) ApplyPolicy(object *unstructured.Unstructured, object
var operationResult controllerutil.OperationResult
err = retry.RetryOnConflict(retry.DefaultRetry, func() (err error) {
operationResult, err = controllerutil.CreateOrUpdate(context.TODO(), d.Client, bindingCopy, func() error {
+ if bindingCopy.DeletionTimestamp != nil {
+ return fmt.Errorf("do not update during deleting/gc")
+ }
// Just update necessary fields, especially avoid modifying Spec.Clusters which is scheduling result, if already exists.
bindingCopy.Labels = util.DedupeAndMergeLabels(bindingCopy.Labels, binding.Labels)
Is it a workable solution? (A little bit ugly though)
Maybe we could hash the uid to make it shorter?
Actually, I like this approach, but it's pretty hard to keep compatible. will look at the solution @dddddai mentioned on above https://github.com/karmada-io/karmada/issues/2090#issuecomment-1181791827.
Right, it could works temporarily like https://github.com/karmada-io/karmada/issues/2090#issuecomment-1193306423 said. We could make a quick fix here and then wait for long-term solution.
+1 @dddddai How do you say? Your idea(as mentioned at https://github.com/karmada-io/karmada/issues/2090#issuecomment-1181791827) is probably the long-term solution.
I'm ok with the quick fix
So for long-term solution we are going to change the naming rule to name-kind-hash
, right?
If so I will try implementing my idea and see if it works
So for long-term solution we are going to change the naming rule to name-kind-hash, right?
How about let's have a talk at the meeting next week.
Sure
anything new about this?
If we are gonna change the naming rule and the idea makes sense, I will work on it. @RainbowMango WDYT