k8s-bigip-ctlr
k8s-bigip-ctlr copied to clipboard
Helm chart should allow replicas, can cause production disruption
Description
During the upgrade of nodes, workloads are rescheduled and moved multiple times. This can cause the configuration in F5 to grow out of date with reality while the CIS is being rescheduled preventing the new node IPs from being recorded in the pool.
Solution Proposed
The helm chart should allow you to specify replicas and a PodDisruptionBudget so that there is always at least 1 CIS pod running at all times. If CIS cannot operate with multiple replicas then that should be addressed.
Alternatives
There are no alternatives.
@braunsonm - We suggest 1 replica for any CIS deployment. Any other thoughts on this ?
@trinaths - we held a phone call with Mark D about this also if you want to confer with him. Braunsonm may want to respond but consider this scenario: you are upgrading your cluster and you take down the node that CIS is running on. CIS is restarted on an upgraded node, but your upgraded node does not have access to pull down the CIS image for some reason, like your registry mirror is not yet available or broken. You'd be stuck in a state where CIS is not running and the config on the BIG-IP is stale.
Or this scenario: you have CIS running on the same node as one or more of your application pods. This node fails. CIS is restarted on another node. Perhaps your app has restarted on another node faster than CIS, or perhaps your app had some pods that were not affected by node failure. BIG-IP config is stale during the time it takes CIS to restart (which may include pulling the image) and then update the BIG-IP. This could mean a longer outage than necessary (if app pods restart faster than CIS reconfigures BIG-IP) or it could mean unequal distribution of load during this time (if some healthy app pods remain in BIG-IP config but others are not because config is stale).
These edge cases are currently possible because the replica is 1 for any CIS deployment.
@mikeoleary - Is the requirement still valid to consider ?
@trinaths - Since CIS only supports a replica of 1, I don't think it's within the scope of your roadmap for now.
In the future, if there are plans to have CIS support a replica of 2 or more, then we can make sure the helm chart could support a PodDisruptionBudget.
Thanking you. Closing this issue as not planned for CIS.