karmada icon indicating copy to clipboard operation
karmada copied to clipboard

Pod restarted when promote a existing deployment from member cluster to Karmada

Open chaosi-zju opened this issue 1 year ago • 6 comments

What happened:

A deployment already existed in member1 cluster.

The user want to promote the existing deployment from member1 cluster to Karmada using conflictResolution: Overwrite.

In the case when Customization ResourceInterpreter is used in Karmada and empty filed like affinity: {} is used in ResourceTemplate, the promotion run suceess but the related pod has restarted.

What you expected to happen:

The existing deployment will be promoted by Karmada without related pod restarted.

How to reproduce it (as minimally and precisely as possible):

  1. write nginx.yaml as follows
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      affinity: {}
      terminationGracePeriodSeconds: 0
      securityContext: {}
      containers:
        - image: nginx
          lifecycle: {}
          name: nginx
          resources:
            limits:
              cpu: 25m
              memory: 64Mi
          securityContext: {}
  1. write pp.yaml as follows
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: pp
spec:
  conflictResolution: Overwrite
  resourceSelectors:
    - apiVersion: apps/v1
      kind: Deployment
      name: nginx
  placement:
    clusterAffinity:
      clusterNames:
        - member1
        - member2
  1. write resource-interpreter-customization.yaml as follows
apiVersion: config.karmada.io/v1alpha1
kind: ResourceInterpreterCustomization
metadata:
  name: member-cluster-deploy
spec:
  customizations:
    retention:
      luaScript: |
        function Retain(desiredObj, observedObj)
          local observedAnnotations = observedObj.spec.template.metadata.annotations
          if observedAnnotations ~= nil and observedAnnotations["k8s.haier.net/timestamp"] ~= nil then
            if desiredObj.spec.template.metadata.annotations == nil then
              desiredObj.spec.template.metadata.annotations = {}
            end
            desiredObj.spec.template.metadata.annotations["k8s.haier.net/timestamp"] = observedAnnotations["k8s.haier.net/timestamp"]
          end
          if observedObj.metadata.annotations ~= nil and observedObj.metadata.annotations["karmada.haier.net/retain-replicas"] == "true" then
            desiredObj.spec.replicas = observedObj.spec.replicas
          end
          print("desiredObj ", desiredObj)
          return desiredObj
        end
  target:
    apiVersion: apps/v1
    kind: Deployment
  1. apply nginx.yaml directly to member1 cluster, and watch the changes of pod in member1
kubectl --context member1 apply -f nginx.yaml
kubectl --context member1 get po -w
  1. apply resource-interpreter-customization.yaml to Karmada
kubectl --context karmada-apiserver apply -f resource-interpreter-customization.yaml
  1. apply nginx.yaml and pp.yaml to Karmada
kubectl --context karmada-apiserver apply -f pp.yaml
kubectl --context karmada-apiserver apply -f nginx.yaml
  1. check the result

    the existing deployment will be promoted by Karmada, but the related pod has restarted, which is expected not to restart.


Original Problem

Previous Implementation: Empty struct field({}) -> LuaTable{} -> json encode ([]) -> json (delete []) -> unmarshal -> Field(nil) Empty slice field([]) -> LuaTable{} -> json encode ([]) -> json (delete []) -> unmarshal -> Field(nil)

Expected Result: Empty struct field({}) -> LuaTable{} -> json encode ([]) -> json (convert [] to {}) -> unmarshal -> Field({}) Empty slice field([]) -> LuaTable{} -> json encode ([]) -> json (keep []) -> unmarshal -> Field([])

In a word: empty lua.LTable{} can not be distinguished from empty slice or empty struct when encoding.

Solution

Empty object value are not generated out of thin air, take Retain(desired, observed) for example, empty object value may come from three sources: 1. originally from desired object 2. originally from observed object 3. user added in Custom ResourceInterpreter (the probability is low and not recommended)

We just need to traverse the fields of luaResult, compared to the traversal result of disired and observed objects, 1. if an empty object field exists as a slice in the disired/observed object, keep [] unchanged. 2. if an empty object field exists as a struct in the disired/observed object, convert [] back to {}. 3. if an empty object field doesn't exist in disired/observed object, compatible with previous implementations, remove this field.


Anything else we need to know?:

Environment:

  • Karmada version: latest
  • kubectl-karmada or karmadactl version (the result of kubectl-karmada version or karmadactl version): latest
  • Others:

chaosi-zju avatar Feb 26 '24 12:02 chaosi-zju

https://github.com/karmada-io/karmada/blob/f04c17f72749a8db253da7b2790b65d8e912ce24/pkg/resourceinterpreter/customized/declarative/luavm/lua_convert.go#L30-L104

In the Retain process of Custom ResourceInterpreter, after Lua script is executed, the empty filed like affinity: {} has been deleted, so the Work synchronized to member cluster is different from previous configuration, which lead to the member cluster's pod restarted.

/assign

chaosi-zju avatar Feb 26 '24 13:02 chaosi-zju

Does it solve this issue #3617?

ikaven1024 avatar Feb 27 '24 09:02 ikaven1024

Does it solve this issue #3617?

it seems that it is the same problem~

I will test whether my PR #4656 can resolve the issue #3617 too

chaosi-zju avatar Feb 27 '24 11:02 chaosi-zju

Hi @ikaven1024, I have tested that #4656 can also solve #3617 !

➜  argo kubectl --context karmada-apiserver get rb                                                                            
NAME                    SCHEDULED   FULLYAPPLIED   AGE
nginx-rollout-rollout   True        True           65s

chaosi-zju avatar Feb 27 '24 11:02 chaosi-zju

Great! Also, this #4449.

RainbowMango avatar Feb 27 '24 12:02 RainbowMango

Does it solve this issue #3617?

it seems that it is the same problem~

I will test whether my PR #4656 can resolve the issue #3617 too

I am sorry commenting on the wrong issue. Please move this conclusions to #4656 and continue there.

ikaven1024 avatar Feb 27 '24 15:02 ikaven1024