logging-operator
logging-operator copied to clipboard
FluentDConfigCheck does not get scheduled du to inherited anti-affinity rules
I have added the following pod affinity rules to my fluentd config to ensure that the pods are spread over the nodes:
fluentd:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app.kubernetes.io/name"
operator: In
values:
- fluentd
- key: "app.kubernetes.io/component"
operator: In
values:
- fluentd
topologyKey: "kubernetes.io/hostname"
Now the fluentd-configcheck pods stay in pending state. It seems that they inherit the configuraton of the fluentd statefulsets.
18m Warning FailedScheduling pod/logging-operator-fluentd-configcheck-e97a6f8f 0/8 nodes are available: 2 node(s) didn't match Pod's node affinity/selector, 3 node(s) didn't match pod affinity/anti-affinity rules, 3 node(s) didn't match pod anti-affinity rules, 3 node(s) had taint {CriticalAddonsOnly: true}, that the pod didn't tolerate.
Is there a way to make sure that the fluentd pods are distributed, but the configcheck pods are still scheduled?
Currently there is no way other than disabling the configcheck unfortunately. With a few changes it can be made configurable separately though.
Thanks, I have removed the affinity rules for now. My other idea, to reduce the number of fluentd replicas, to be able to run the configheck periodically, works only as long as all the nodes are available and ready.
I think it was useful being able to configure the configcheck sperately.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions!
Can't we just filter out the pod anti affinities altogether for fluentd configcheck in here? https://github.com/kube-logging/logging-operator/blob/e0331c4b508ff54e8b0958d29ace7e8d7427674b/pkg/resources/fluentd/appconfigmap.go#L222
yes, I don't think it makes sense to use the affinity rules in the configcheck pod
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions!
I still don't think there is a good reason to apply affinity rules to the configcheck pod, but I also don't want to break this for those who might use it (even if it's accidental right now).
The good solution would be to add override options for the configcheck pod, but the logging resource is already very big and require restructuring.
So the options I see:
- we disable inheriting affinity rules (affinity and antiaffinity as well) for configcheck pods (backwards incompatible)
- we add a flag to control weather we want to allow inheriting the rules or not, but then:
- enable flag (disable inheriting by default, which could break existing code, let users opt-in)
- disable flag (enable inheriting by default, but let users enable it, which is not a good experience if most of the time it's not needed)
- we live with this and say configcheck pod requires an extra node in this specific case.
What do you think?
also cc @ahma @tarokkk
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions!
Any updates?
After this time I would simply go with this: https://github.com/kube-logging/logging-operator/pull/1787