fdb-kubernetes-operator
fdb-kubernetes-operator copied to clipboard
multiple operators leads to node explosion
we had multiple operators configured in different namespaces managing across all namespaces - bad idea yes :)).
However, I think it is important that the operator fail safely in that situation, as it can happen by mistake.
What we observed was that when we resized a cluster, both operators would trigger the new pods, then one would write to the cluster successfully, and the other would get a 409 Conflict; then retry shortly after.
So adding say 10 pods, would result in 20 pods being added. And then to shrink I don't know what would happen as we figured it out and fixed the configuration issue before then.
I'm not sure of the best way to fix it, since atomic CAS operations on Pod aren't exactly present - but something along those lines would be ideal; certainly using fdbcli as a serialisation point for cluster actions that can be would be obvious.
We could try to attach something to the cluster resource to give an idea for which instance of the operator owns it. I think all we would have to go on is the namespace the operator is running in, but that might be enough.
See also #325, though I'm not sure that will help cross namespaces.
In this case the leader election will not help since it will only work for Operators in he same namespace (and actually I think it should only be working for Operators of the same deployment). What could do to prevent this is creating an extra field for operator-name
(similar to: https://kubernetes.io/docs/tasks/extend-kubernetes/configure-multiple-schedulers/#specify-schedulers-for-pods) to identify the operator that should manage the cluster. This wouldn't prevent that a user configures two operator deployments with the same name but at least a user has a way to run multiple operators without a conflict.
Is the goal here to prevent bad behavior when accidentally running multiple operators, or to allow the user to run multiple operators without a conflict? In the latter case, I'm not sure what the desired semantics would be. In the former case, we'd have to consider how to infer that the operators are different since we can't rely on the user having chosen to differentiate them.
Kubernetes support for multiple schema versions of CRDs is pretty limited: you have to declare the schema in one document. But within that constraint, being able to gradually rollout a new operator version, one cluster at a time, might be quite nice. However, we'd want to do end up with a converged fleet, and I haven't put any thought at all into the best way to manage this thus far : to date just running the current operator and managing the fleet via lots of kubernetes has been fine.
The spark that generated this bug report was accidentally running 2 copies of the operator and causing havoc as a result.
Probably some kind of a solution for this is to run the operator with a label selector: https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/docs/manual/operations.md#sharding-for-the-operator and give each operator deployment a unique set of labels to work with and to reduce the blast radius. There is still a limitation in a way that if someone crate another deployment matching the same labels here will be a "conflict" or if someone starts a deployment without a label selector.
Another solution could be to let the operator initially write into a predefined key space it's "identity" like namespace and deployment name and if those are not matching the operator should ignore this cluster but in the end this solution has the same limitation as the solution with the label selector, if someone configures something wrong there might be conflicts.
How should wee proceed with this issue? I would tend to close this issue, since the operator provides some possible safeguards with the label selector.
I'm going to close this issue (please feel free to open this issue if you think it's not resolved) as this was a configuration issue and I'm not aware of any efforts that try to solve this issue.