karmada icon indicating copy to clipboard operation
karmada copied to clipboard

fix: avoid delete the key with empty value in object (lua table)

Open chaosi-zju opened this issue 11 months ago • 6 comments

What type of PR is this?

/kind bug

What this PR does / why we need it:

A deployment already existed in member1 cluster.

The user want to promote the existing deployment from member1 cluster to Karmada using conflictResolution: Overwrite.

In the case when Customization ResourceInterpreter is used in Karmada and empty filed like affinity: {} is used in ResourceTemplate, the promotion run suceess but the related pod has restarted.

What we expected is that the existing deployment will be promoted by Karmada without related pod restarted.

The reason lies in the Retain process of Custom ResourceInterpreter, after Lua script is executed, the empty filed like affinity: {} has been deleted, so the Work synchronized to member cluster is different from previous configuration, which lead to the member cluster's pod restarted.

The resolution is avoid deleting the key with empty value in object (lua table), keep the same as previously configurated.

Which issue(s) this PR fixes:

Fixes #4653

Fixes #3617

Fixes #4449

Special notes for your reviewer:

  • Original Problem:

Previous Implementation: Empty struct field({}) -> LuaTable{} -> json encode ([]) -> json (delete []) -> unmarshal -> Field(nil) Empty slice field([]) -> LuaTable{} -> json encode ([]) -> json (delete []) -> unmarshal -> Field(nil)

Expected Result: Empty struct field({}) -> LuaTable{} -> json encode ([]) -> json (convert [] to {}) -> unmarshal -> Field({}) Empty slice field([]) -> LuaTable{} -> json encode ([]) -> json (keep []) -> unmarshal -> Field([])

In a word: empty lua.LTable{} can not be distinguished from empty slice or empty struct when encoding.

  • Solution

Empty object value are not generated out of thin air, take Retain(desired, observed) for example, empty object value may come from three sources: 1. originally from desired object 2. originally from observed object 3. user added in Custom ResourceInterpreter (the probability is low and not recommended)

We just need to traverse the fields of luaResult, compared to the traversal result of disired and observed objects, 1. if an empty object field exists as a slice in the disired/observed object, keep [] unchanged. 2. if an empty object field exists as a struct in the disired/observed object, convert [] back to {}. 3. if an empty object field doesn't exist in disired/observed object, compatible with previous implementations, remove this field.

Does this PR introduce a user-facing change?:


chaosi-zju avatar Feb 27 '24 09:02 chaosi-zju

@ikaven1024 hello, do you have time to help take a look at this problem? Thank you very much~

chaosi-zju avatar Feb 27 '24 09:02 chaosi-zju

Codecov Report

Attention: Patch coverage is 85.84071% with 16 lines in your changes are missing coverage. Please review.

Project coverage is 51.86%. Comparing base (bc93167) to head (e2babc3). Report is 111 commits behind head on master.

Files Patch % Lines
...preter/customized/declarative/luavm/lua_convert.go 85.04% 10 Missing and 6 partials :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4656      +/-   ##
==========================================
+ Coverage   51.44%   51.86%   +0.41%     
==========================================
  Files         250      250              
  Lines       24951    25056     +105     
==========================================
+ Hits        12837    12995     +158     
+ Misses      11407    11348      -59     
- Partials      707      713       +6     
Flag Coverage Δ
unittests 51.86% <85.84%> (+0.41%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Feb 27 '24 12:02 codecov-commenter

Does it solve this issue #3617?

Hi @ikaven1024, it seems that it is the same problem as #3617~

And, I have tested that this PR can also solve #3617 !

➜  argo kubectl --context karmada-apiserver get rb                                                                            
NAME                    SCHEDULED   FULLYAPPLIED   AGE
nginx-rollout-rollout   True        True           65s

chaosi-zju avatar Feb 27 '24 15:02 chaosi-zju

I find a unsatisfied case.

  1. Apply Rollout CRDs on both Karmada APIserver and member1 as per https://argo-rollouts.readthedocs.io/en/stable/installation/#controller-installation.
  2. Apply a ResourceInterpreterCustomization to Karmada APIserver.
apiVersion: config.karmada.io/v1alpha1
kind: ResourceInterpreterCustomization
metadata:
  name: declarative-configuration-argo-rollout
spec:
  target:
    apiVersion: argoproj.io/v1alpha1
    kind: Rollout
  customizations:
    retention:
      luaScript: |
        function Retain(desiredObj, observedObj)
          print("[DEBUG] desiredObj ", desiredObj)
          return desiredObj
        end

  1. Apply the Rollout yaml to Karmada APIserver (modified on the basis of #3617)
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  labels:
    app: nginx
  name: nginx-rollout
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: nginx
  strategy:
    canary:
      maxSurge: 100%
      maxUnavailable: 0
      steps:
        - setWeight: 40
        - pause: {}
        - experiment:
            templates: []
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - image: nginx
          imagePullPolicy: Always
          name: nginx
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
  1. Apply a PropagationPolicy as follows:
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: nginx-propagation
spec:
  resourceSelectors:
    - apiVersion: argoproj.io/v1alpha1
      kind: Rollout
      name: nginx-rollout
  placement:
    replicaScheduling:
      replicaDivisionPreference: Weighted
      replicaSchedulingType: Divided
    clusterAffinity:
      clusterNames:
        - member1

Pay attention to the Rollout resource, there is a spec.strategy.canary.steps[2].experiment.templates filed, its value is a slice with zero size, the filed is defined as:

// Templates what templates that should be added to the experiment. Should be non-nil
// +patchMergeKey=name
// +patchStrategy=merge
Templates []RolloutExperimentTemplate `json:"templates" patchStrategy:"merge" patchMergeKey:"name" protobuf:"bytes,1,rep,name=templates"`

However, in the process of ConvertLuaResultInto, the key-pair templates: [] will keep existed in jsonBytes variable, and would be incorrectly converted to templates: {} by my this PR, which will lead to jsonBytes cannot be unmarshal back to Rollout struct,

chaosi-zju avatar Feb 28 '24 07:02 chaosi-zju

After numerous days of meticulous exploration, I finally found the most comprehensive solution.

I added detailed comments and UT in the code, thanks for your precious time to review again~

@ikaven1024 @XiShanYongYe-Chang @RainbowMango

chaosi-zju avatar Mar 02 '24 15:03 chaosi-zju

Hi @XiShanYongYe-Chang, all your comments have been resolved, do you have time to check again?

Hi @ikaven1024, Do you have any further comments?

chaosi-zju avatar Mar 06 '24 08:03 chaosi-zju

Thanks~ LGTM

XiShanYongYe-Chang avatar Mar 06 '24 08:03 XiShanYongYe-Chang

kindly ping @ikaven1024 ~ ٩(๑❛ᴗ❛๑)۶

chaosi-zju avatar Mar 28 '24 02:03 chaosi-zju

Is there any further comments?

chaosi-zju avatar Apr 03 '24 10:04 chaosi-zju

/lgtm /assign @XiShanYongYe-Chang

ikaven1024 avatar Apr 07 '24 12:04 ikaven1024

Hi @chaosi-zju, can you help fill out the release note?

/cc @RainbowMango

XiShanYongYe-Chang avatar Apr 08 '24 03:04 XiShanYongYe-Chang

Hi @chaosi-zju, can you help fill out the release note?

done

chaosi-zju avatar Apr 08 '24 08:04 chaosi-zju

/approve

RainbowMango avatar Apr 16 '24 03:04 RainbowMango

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango, XiShanYongYe-Chang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

karmada-bot avatar Apr 16 '24 03:04 karmada-bot