karmada
karmada copied to clipboard
OverridePolicy will not work if op is applied after PropagationPolicy and op has nil ResourceSelectors
What happened: OverridePolicy will work if op is applied before PropagationPolicy created(rb created) and op has nil ResourceSelectors What you expected to happen: 1.OverridePolicy will work if op is applied after PropagationPolicy created(rb created) and op has nil ResourceSelectors 2.OverridePolicy will work if op is applied before PropagationPolicy created(rb created) and op has nil ResourceSelectors How to reproduce it (as minimally and precisely as possible):
- create deployment
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: busybox1
labels:
app: busybox1
spec:
replicas: 1
selector:
matchLabels:
app: busybox1
template:
metadata:
name: busy-p
labels:
app: busybox1
spec:
containers:
- name: busy-c
image: busybox:latest
imagePullPolicy: IfNotPresent
command: ["sleep", "6000"]
- create pp
---
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
name: busybox-propagation1
spec:
resourceSelectors:
- apiVersion: apps/v1
kind: Deployment
labelSelector:
matchLabels:
app: busybox1
placement:
clusterAffinity:
clusterNames:
- cluster30
# exclude:
# - cluster30
spreadConstraints:
- maxGroups: 1
- create op after 1s
---
apiVersion: policy.karmada.io/v1alpha1
kind: OverridePolicy
metadata:
name: busybox-overridepolicy1
namespace: default
spec:
overrideRules:
- targetCluster:
clusterNames:
- cluster30
overriders:
plaintext:
- path: "/spec/replicas"
operator: replace
value: 3
not desired output
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2022-10-28T08:57:20Z"
lastUpdateTime: "2022-10-28T08:57:20Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2022-10-28T08:57:18Z"
lastUpdateTime: "2022-10-28T08:57:20Z"
message: ReplicaSet "busybox1-5664dcbdf8" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 1
readyReplicas: 1
replicas: 1
updatedReplicas: 1
But if we created op before pp desired output
status:
availableReplicas: 3
conditions:
- lastTransitionTime: "2022-10-28T08:57:18Z"
lastUpdateTime: "2022-10-28T08:57:20Z"
message: ReplicaSet "busybox1-5664dcbdf8" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
- lastTransitionTime: "2022-10-28T08:59:58Z"
lastUpdateTime: "2022-10-28T08:59:58Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 2
readyReplicas: 3
replicas: 3
updatedReplicas: 3
Anything else we need to know?:
Environment:
- Karmada version:
- kubectl-karmada or karmadactl version (the result of
kubectl-karmada version
orkarmadactl version
): - Others:
/assign
/cc @RainbowMango @XiShanYongYe-Chang @chaunceyjiang
It seems the newOverridePolicyFunc
prevent the binding_controller reconcile, Can we modify here?
Hi @wuyingjun-lucky, what does op has no ResourceSelectors
mean? I don't see that in the example you've given.
The logic of newOverridePolicyFunc
seems will reconcile the rbs that match the op:
https://github.com/karmada-io/karmada/blob/e6e946ff51ecca5cbb7d8c0e9931b559e44313e9/pkg/controllers/binding/binding_controller.go#L246-L267
apiVersion: policy.karmada.io/v1alpha1 kind: OverridePolicy metadata: name: busybox-overridepolicy1 namespace: default spec:
overrideRules: - targetCluster: clusterNames: - cluster30 overriders: plaintext: - path: "/spec/replicas" operator: replace value: 3
Sorry , I have some wrong in pasting OP
---
apiVersion: policy.karmada.io/v1alpha1
kind: OverridePolicy
metadata:
name: busybox-overridepolicy1
namespace: default
spec:
overrideRules:
- targetCluster:
clusterNames:
- cluster30
overriders:
plaintext:
- path: "/spec/replicas"
operator: replace
value: 3
@XiShanYongYe-Chang
The rbs may not be matched. We can add some logs for debugging:
https://github.com/karmada-io/karmada/blob/e6e946ff51ecca5cbb7d8c0e9931b559e44313e9/pkg/controllers/binding/binding_controller.go#L261
The rbs may not be matched. We can add some logs for debugging:
https://github.com/karmada-io/karmada/blob/e6e946ff51ecca5cbb7d8c0e9931b559e44313e9/pkg/controllers/binding/binding_controller.go#L261
1
I create deployment
2
I create op with nil resource selector in same ns
3
I create pp in same ns
The op will works . Because
// ResourceSelectors restricts resource types that this override policy applies to.
// nil means matching all resources.
// +optional
ResourceSelectors []ResourceSelector `json:"resourceSelectors,omitempty"`
and
for _, policy := range policies {
if len(policy.GetOverrideSpec().ResourceSelectors) == 0 {
resourceMatchingPolicies = append(resourceMatchingPolicies, policy)
continue
}
if util.ResourceMatchSelectors(resource, policy.GetOverrideSpec().ResourceSelectors...) {
resourceMatchingPolicies = append(resourceMatchingPolicies, policy)
}
}
But if we change step 2 and 3 We create pp first and op later It will not reconcile again because the resource selector of op is nil
for _, rs := range overrideRS {
if util.ResourceMatches(workload, rs) {
klog.V(2).Infof("Enqueue ResourceBinding(%s/%s) as override policy(%s/%s) changes.", binding.Namespace, binding.Name, a.GetNamespace(), a.GetName())
requests = append(requests, reconcile.Request{NamespacedName: types.NamespacedName{Namespace: binding.Namespace, Name: binding.Name}})
break
}
}
我们的环境出现的问题情况 就是 如果 op的 resourceselector 是空的 ,当你创建这个op的时候,他无法触发binding_controller 的调和。 但是 按照 op的接口设计 resourceselector 为空应该是match 这个 ns下 所有的。 只有 在 binding_controller的因为 resourcebinding 或者其他资源调和 之前 这个 带有 空的 resourceselector 的 op 已经被应用了,他才会应用成功。 不知道我是否理解和描述的有问题。
In addatation, when op has no ResourceSelectors, we create deployment first, then create pp, finally we create op. op will take effect. At this time we modify op, The new changes will not be applied to the deployment. They are caused by the same reason. when op has no ResourceSelectorsr created or modified, the work object already existed will not be affected
Thanks, @wuyingjun-lucky @hanweisen I got it, There's a real problem with this behavior. When op's resourceselector
is nil, this logic needs to put all bindings into requests
in newOverridePolicyFunc
.
In principle, it's supposed to be this behavior. However, in the production environment, if users do this, is it possible that a large number of bindings are triggered for reconcile, affecting performance?
Thanks, @wuyingjun-lucky @hanweisen I got it, There's a real problem with this behavior. When op's
resourceselector
is nil, this logic needs to put all bindings intorequests
innewOverridePolicyFunc
.In principle, it's supposed to be this behavior. However, in the production environment, if users do this, is it possible that a large number of bindings are triggered for reconcile, affecting performance?
-
I think users should be cautious with usage of nil resource selector in overridepolicy, especially in production environment.
-
If the event on overridepolicy (nil rs) can not trigger the reconcile of resourcebinding that will cause many overridepolicies can not work and all modification on this situation will be useless. That may let users confused.
Should we let the behaviour on overridepolicy (nil rs) keep pace with normal overridepolicy or should we make it more clear on the api comments ?
Should the behaviour on overridepolicy (nil rs) keep pace with normal overridepolicy or should we make it more clear on the api comments ?
Let's invite more people to discuss it. /cc @RainbowMango @chaunceyjiang Can you help take a look?
Indeed. I can reproduce it.
Example A
1. create deployment
2. create OP (resourceSelectors is nil)
3. create PP
The OP will take effect.
Example B
1. create deployment
2. create PP
3. wait 10s
4. create OP (resourceSelectors is nil)
The OP will not take effect.
Example C
1. create deployment
2. create an OP named `test` (resourceSelectors is nil)
3. create PP
4. wait 10s
5. modify the `overrides` of an OP named `test`
The new op will not take effect.
Whether the OP take effect or not is related to the time of creation, which will definitely confuse users.
If a user deploys a private application to karmada
using helm
, and there are some OPs in the helm chart package, it becomes uncontrollable whether these OPs take effect or not. 🤔
Whether the OP take effect or not is related to the time of creation, which will definitely confuse users.
If a user deploys a private application to
karmada
usinghelm
, and there are some OPs in the helm chart package, it becomes uncontrollable whether these OPs take effect or not. 🤔
So should we let the behaviour on overridepolicy (nil rs) keep pace with normal overridepolicy or should we make it more clear on the api comments ?
let the behaviour on overridepolicy (nil rs) keep pace with normal overridepolicy
+1
By the way. I suggest you put this issute into the next Karmada meeting.
let the behaviour on overridepolicy (nil rs) keep pace with normal overridepolicy
+1
By the way. I suggest you put this issute into the next Karmada meeting.
OK, If we can not draw a conclusion. I will put it into the meeting.
OK, If we can not draw a conclusion. I will put it into the meeting.
It will be ok~