karmada icon indicating copy to clipboard operation
karmada copied to clipboard

OverridePolicy will not work if op is applied after PropagationPolicy and op has nil ResourceSelectors

Open wuyingjun-lucky opened this issue 2 years ago • 18 comments

What happened: OverridePolicy will work if op is applied before PropagationPolicy created(rb created) and op has nil ResourceSelectors What you expected to happen: 1.OverridePolicy will work if op is applied after PropagationPolicy created(rb created) and op has nil ResourceSelectors 2.OverridePolicy will work if op is applied before PropagationPolicy created(rb created) and op has nil ResourceSelectors How to reproduce it (as minimally and precisely as possible):

  1. create deployment
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox1
  labels:
    app: busybox1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: busybox1
  template:
    metadata:
      name: busy-p
      labels:
        app: busybox1
    spec:
      containers:
        - name: busy-c
          image: busybox:latest
          imagePullPolicy: IfNotPresent
          command: ["sleep", "6000"]
  1. create pp
---
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: busybox-propagation1
spec:
  resourceSelectors:
    - apiVersion: apps/v1
      kind: Deployment
      labelSelector: 
         matchLabels:
              app: busybox1
  placement:
    clusterAffinity:
      clusterNames:
        - cluster30
      # exclude:
      #   - cluster30
    spreadConstraints:
        - maxGroups: 1
  1. create op after 1s
---
apiVersion: policy.karmada.io/v1alpha1
kind: OverridePolicy
metadata:
  name: busybox-overridepolicy1
  namespace: default
spec:
  
  overrideRules:
    - targetCluster:
        clusterNames:
          - cluster30
      overriders:
        plaintext:
          - path: "/spec/replicas"
            operator: replace
            value: 3

not desired output

status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2022-10-28T08:57:20Z"
    lastUpdateTime: "2022-10-28T08:57:20Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2022-10-28T08:57:18Z"
    lastUpdateTime: "2022-10-28T08:57:20Z"
    message: ReplicaSet "busybox1-5664dcbdf8" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

But if we created op before pp desired output

status:
  availableReplicas: 3
  conditions:
  - lastTransitionTime: "2022-10-28T08:57:18Z"
    lastUpdateTime: "2022-10-28T08:57:20Z"
    message: ReplicaSet "busybox1-5664dcbdf8" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  - lastTransitionTime: "2022-10-28T08:59:58Z"
    lastUpdateTime: "2022-10-28T08:59:58Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  observedGeneration: 2
  readyReplicas: 3
  replicas: 3
  updatedReplicas: 3

Anything else we need to know?:

Environment:

  • Karmada version:
  • kubectl-karmada or karmadactl version (the result of kubectl-karmada version or karmadactl version):
  • Others:

wuyingjun-lucky avatar Oct 28 '22 09:10 wuyingjun-lucky

/assign

wuyingjun-lucky avatar Oct 28 '22 09:10 wuyingjun-lucky

/cc @RainbowMango @XiShanYongYe-Chang @chaunceyjiang

wuyingjun-lucky avatar Oct 28 '22 09:10 wuyingjun-lucky

It seems the newOverridePolicyFunc prevent the binding_controller reconcile, Can we modify here?

wuyingjun-lucky avatar Oct 28 '22 09:10 wuyingjun-lucky

Hi @wuyingjun-lucky, what does op has no ResourceSelectors mean? I don't see that in the example you've given.

The logic of newOverridePolicyFunc seems will reconcile the rbs that match the op: https://github.com/karmada-io/karmada/blob/e6e946ff51ecca5cbb7d8c0e9931b559e44313e9/pkg/controllers/binding/binding_controller.go#L246-L267

XiShanYongYe-Chang avatar Oct 28 '22 09:10 XiShanYongYe-Chang

apiVersion: policy.karmada.io/v1alpha1 kind: OverridePolicy metadata: name: busybox-overridepolicy1 namespace: default spec:

overrideRules: - targetCluster: clusterNames: - cluster30 overriders: plaintext: - path: "/spec/replicas" operator: replace value: 3

Sorry , I have some wrong in pasting OP

---
apiVersion: policy.karmada.io/v1alpha1
kind: OverridePolicy
metadata:
  name: busybox-overridepolicy1
  namespace: default
spec:

  overrideRules:
    - targetCluster:
        clusterNames:
          - cluster30
      overriders:
        plaintext:
          - path: "/spec/replicas"
            operator: replace
            value: 3

@XiShanYongYe-Chang

wuyingjun-lucky avatar Oct 28 '22 09:10 wuyingjun-lucky

The rbs may not be matched. We can add some logs for debugging:

https://github.com/karmada-io/karmada/blob/e6e946ff51ecca5cbb7d8c0e9931b559e44313e9/pkg/controllers/binding/binding_controller.go#L261

XiShanYongYe-Chang avatar Oct 28 '22 09:10 XiShanYongYe-Chang

The rbs may not be matched. We can add some logs for debugging:

https://github.com/karmada-io/karmada/blob/e6e946ff51ecca5cbb7d8c0e9931b559e44313e9/pkg/controllers/binding/binding_controller.go#L261

1 I create deployment 2 I create op with nil resource selector in same ns 3 I create pp in same ns

The op will works . Because

// ResourceSelectors restricts resource types that this override policy applies to.
	// nil means matching all resources.
	// +optional
	ResourceSelectors []ResourceSelector `json:"resourceSelectors,omitempty"`

and

for _, policy := range policies {
		if len(policy.GetOverrideSpec().ResourceSelectors) == 0 {
			resourceMatchingPolicies = append(resourceMatchingPolicies, policy)
			continue
		}

		if util.ResourceMatchSelectors(resource, policy.GetOverrideSpec().ResourceSelectors...) {
			resourceMatchingPolicies = append(resourceMatchingPolicies, policy)
		}
	}

But if we change step 2 and 3 We create pp first and op later It will not reconcile again because the resource selector of op is nil

for _, rs := range overrideRS {
				if util.ResourceMatches(workload, rs) {
					klog.V(2).Infof("Enqueue ResourceBinding(%s/%s) as override policy(%s/%s) changes.", binding.Namespace, binding.Name, a.GetNamespace(), a.GetName())
					requests = append(requests, reconcile.Request{NamespacedName: types.NamespacedName{Namespace: binding.Namespace, Name: binding.Name}})
					break
				}
			}

wuyingjun-lucky avatar Oct 28 '22 10:10 wuyingjun-lucky

我们的环境出现的问题情况 就是 如果 op的 resourceselector 是空的 ,当你创建这个op的时候,他无法触发binding_controller 的调和。 但是 按照 op的接口设计 resourceselector 为空应该是match 这个 ns下 所有的。 只有 在 binding_controller的因为 resourcebinding 或者其他资源调和 之前 这个 带有 空的 resourceselector 的 op 已经被应用了,他才会应用成功。 不知道我是否理解和描述的有问题。

wuyingjun-lucky avatar Oct 28 '22 10:10 wuyingjun-lucky

In addatation, when op has no ResourceSelectors, we create deployment first, then create pp, finally we create op. op will take effect. At this time we modify op, The new changes will not be applied to the deployment. They are caused by the same reason. when op has no ResourceSelectorsr created or modified, the work object already existed will not be affected

hanweisen avatar Oct 28 '22 14:10 hanweisen

Thanks, @wuyingjun-lucky @hanweisen I got it, There's a real problem with this behavior. When op's resourceselector is nil, this logic needs to put all bindings into requests in newOverridePolicyFunc.

In principle, it's supposed to be this behavior. However, in the production environment, if users do this, is it possible that a large number of bindings are triggered for reconcile, affecting performance?

XiShanYongYe-Chang avatar Oct 29 '22 09:10 XiShanYongYe-Chang

Thanks, @wuyingjun-lucky @hanweisen I got it, There's a real problem with this behavior. When op's resourceselector is nil, this logic needs to put all bindings into requests in newOverridePolicyFunc.

In principle, it's supposed to be this behavior. However, in the production environment, if users do this, is it possible that a large number of bindings are triggered for reconcile, affecting performance?

  1. I think users should be cautious with usage of nil resource selector in overridepolicy, especially in production environment.

  2. If the event on overridepolicy (nil rs) can not trigger the reconcile of resourcebinding that will cause many overridepolicies can not work and all modification on this situation will be useless. That may let users confused.

Should we let the behaviour on overridepolicy (nil rs) keep pace with normal overridepolicy or should we make it more clear on the api comments ?

wuyingjun-lucky avatar Oct 31 '22 02:10 wuyingjun-lucky

Should the behaviour on overridepolicy (nil rs) keep pace with normal overridepolicy or should we make it more clear on the api comments ?

Let's invite more people to discuss it. /cc @RainbowMango @chaunceyjiang Can you help take a look?

XiShanYongYe-Chang avatar Oct 31 '22 02:10 XiShanYongYe-Chang

Indeed. I can reproduce it.

Example A
1. create deployment
2. create OP (resourceSelectors is nil)
3. create PP

The OP will take effect.
Example B
1. create deployment
2. create PP
3. wait 10s
4. create OP (resourceSelectors is nil)

The OP will not take effect.
Example C
1. create deployment
2. create an OP named `test` (resourceSelectors is nil)
3. create PP
4. wait 10s
5. modify the `overrides` of an OP named `test`

The new op will not take effect.

chaunceyjiang avatar Oct 31 '22 06:10 chaunceyjiang

Whether the OP take effect or not is related to the time of creation, which will definitely confuse users.

If a user deploys a private application to karmada using helm, and there are some OPs in the helm chart package, it becomes uncontrollable whether these OPs take effect or not. 🤔

chaunceyjiang avatar Oct 31 '22 06:10 chaunceyjiang

Whether the OP take effect or not is related to the time of creation, which will definitely confuse users.

If a user deploys a private application to karmada using helm, and there are some OPs in the helm chart package, it becomes uncontrollable whether these OPs take effect or not. 🤔

So should we let the behaviour on overridepolicy (nil rs) keep pace with normal overridepolicy or should we make it more clear on the api comments ?

wuyingjun-lucky avatar Oct 31 '22 06:10 wuyingjun-lucky

let the behaviour on overridepolicy (nil rs) keep pace with normal overridepolicy

+1

By the way. I suggest you put this issute into the next Karmada meeting.

chaunceyjiang avatar Oct 31 '22 07:10 chaunceyjiang

let the behaviour on overridepolicy (nil rs) keep pace with normal overridepolicy

+1

By the way. I suggest you put this issute into the next Karmada meeting.

OK, If we can not draw a conclusion. I will put it into the meeting.

wuyingjun-lucky avatar Oct 31 '22 07:10 wuyingjun-lucky

OK, If we can not draw a conclusion. I will put it into the meeting.

It will be ok~

XiShanYongYe-Chang avatar Oct 31 '22 07:10 XiShanYongYe-Chang