aks-engine
aks-engine copied to clipboard
Clusters in bridge mode may fail from unattended upgrades
Summary
In the last 48 hours, some AKS Engine clusters have experienced nodes falling into a NotReady
state, or lost connectivity to the cluster if all control plane nodes were affected.
To remediate this situation, reboot the affected nodes, starting with the control plane. Node VMs can be rebooted in the Azure portal or with the az
command line tool:
-
az vm restart
-
az vmss restart
Describe the bug
Before October 2020, AKS Engine releases defaulted to creating clusters with "bridge" Kubernetes NetworkMode
. A recent Ubuntu update to the systemd package may result in this configuration being broken and the node becoming NotReady
. If this happens on control plane nodes, the cluster may become unavailable.
Rebooting the node should recreate a working network configuration. Clusters created recently use "transparent" networking mode by default and will not see this problem.
AKS Engine version
Versions before v0.58.0, or any version with an API model not using "transparent" Kubernetes networking mode.
Kubernetes version
Any.
Additional context
The root issue is described here: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1937117
done: add instructions for detecting whether bridge mode is enabled done: add instructions for disabling unattended upgrades TODO: update docs to discourage choosing bridge mode? TODO: add a deployment warning in code about bridge mode?
Q: How can I tell if I'm using the problematic "bridge" mode or the preferred "transparent" networking mode?
A: Look at the cluster template (aka "API model") used to deploy your cluster, probably found in a location like _output/mycluster/apimodel.json
. The kubernetesConfig
near the top of the file will have a networkMode
value:
{
"apiVersion": "vlabs",
"location": "southeastasia",
"properties": {
"orchestratorProfile": {
"orchestratorType": "Kubernetes",
"orchestratorRelease": "1.22",
"orchestratorVersion": "1.22.0-beta.0",
"kubernetesConfig": {
"kubernetesImageBase": "mcr.microsoft.com/",
"kubernetesImageBaseType": "mcr",
"mcrKubernetesImageBase": "mcr.microsoft.com/",
"clusterSubnet": "10.239.0.0/16",
"dnsServiceIP": "10.0.0.10",
"serviceCidr": "10.0.0.0/16",
"networkPlugin": "azure",
"networkMode": "transparent",
"containerRuntime": "docker",
...
If the value is not "transparent," and you are experiencing connectivity problems to nodes or to the cluster, consider deploying a new cluster with a current version of AKS Engine using a new cluster template. Versions of AKS Engine since v0.58.0 will default to transparent mode in clusters they create.
Q: How can I disable unattended upgrades for my AKS Engine cluster?
A: AKS Engine doesn't provide a way to toggle this behavior off (see #4594). By default, Ubuntu clusters created with AKS Engine will install updates in the background for security and bug fixes.
The unattended upgrades behavior can be disabled on running nodes by writing the file /etc/apt/apt.conf.d/99periodic
with 0644 permissions and these contents to each affected VM:
APT::Periodic::Update-Package-Lists "0";
APT::Periodic::Download-Upgradeable-Packages "0";
APT::Periodic::AutocleanInterval "0";
APT::Periodic::Unattended-Upgrade "0";
But note that this is not a durable fix: scaling the cluster will result in new nodes with unattended upgrades enabled.
Update: see #4614.
🥲
TODO: update docs to discourage choosing bridge mode?
Done in https://github.com/Azure/aks-engine/pull/4857.
Closing per @mboersma as this is fixed and/or documented in all current, supported versions.