aks-engine icon indicating copy to clipboard operation
aks-engine copied to clipboard

Clusters in bridge mode may fail from unattended upgrades

Open mboersma opened this issue 3 years ago • 5 comments

Summary

In the last 48 hours, some AKS Engine clusters have experienced nodes falling into a NotReady state, or lost connectivity to the cluster if all control plane nodes were affected.

To remediate this situation, reboot the affected nodes, starting with the control plane. Node VMs can be rebooted in the Azure portal or with the az command line tool:

Describe the bug

Before October 2020, AKS Engine releases defaulted to creating clusters with "bridge" Kubernetes NetworkMode. A recent Ubuntu update to the systemd package may result in this configuration being broken and the node becoming NotReady. If this happens on control plane nodes, the cluster may become unavailable.

Rebooting the node should recreate a working network configuration. Clusters created recently use "transparent" networking mode by default and will not see this problem.

AKS Engine version

Versions before v0.58.0, or any version with an API model not using "transparent" Kubernetes networking mode.

Kubernetes version

Any.

Additional context

The root issue is described here: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1937117

mboersma avatar Jul 21 '21 20:07 mboersma

done: add instructions for detecting whether bridge mode is enabled done: add instructions for disabling unattended upgrades TODO: update docs to discourage choosing bridge mode? TODO: add a deployment warning in code about bridge mode?

mboersma avatar Jul 21 '21 20:07 mboersma

Q: How can I tell if I'm using the problematic "bridge" mode or the preferred "transparent" networking mode?

A: Look at the cluster template (aka "API model") used to deploy your cluster, probably found in a location like _output/mycluster/apimodel.json. The kubernetesConfig near the top of the file will have a networkMode value:

{
  "apiVersion": "vlabs",
  "location": "southeastasia",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorRelease": "1.22",
      "orchestratorVersion": "1.22.0-beta.0",
      "kubernetesConfig": {
        "kubernetesImageBase": "mcr.microsoft.com/",
        "kubernetesImageBaseType": "mcr",
        "mcrKubernetesImageBase": "mcr.microsoft.com/",
        "clusterSubnet": "10.239.0.0/16",
        "dnsServiceIP": "10.0.0.10",
        "serviceCidr": "10.0.0.0/16",
        "networkPlugin": "azure",
        "networkMode": "transparent",
        "containerRuntime": "docker",
...

If the value is not "transparent," and you are experiencing connectivity problems to nodes or to the cluster, consider deploying a new cluster with a current version of AKS Engine using a new cluster template. Versions of AKS Engine since v0.58.0 will default to transparent mode in clusters they create.

mboersma avatar Jul 22 '21 17:07 mboersma

Q: How can I disable unattended upgrades for my AKS Engine cluster?

A: AKS Engine doesn't provide a way to toggle this behavior off (see #4594). By default, Ubuntu clusters created with AKS Engine will install updates in the background for security and bug fixes.

The unattended upgrades behavior can be disabled on running nodes by writing the file /etc/apt/apt.conf.d/99periodic with 0644 permissions and these contents to each affected VM:

    APT::Periodic::Update-Package-Lists "0";
    APT::Periodic::Download-Upgradeable-Packages "0";
    APT::Periodic::AutocleanInterval "0";
    APT::Periodic::Unattended-Upgrade "0";

But note that this is not a durable fix: scaling the cluster will result in new nodes with unattended upgrades enabled.

Update: see #4614.

mboersma avatar Jul 22 '21 17:07 mboersma

🥲

thewisenerd avatar Jul 26 '21 01:07 thewisenerd

TODO: update docs to discourage choosing bridge mode?

Done in https://github.com/Azure/aks-engine/pull/4857.

bridgetkromhout avatar Mar 17 '22 19:03 bridgetkromhout

Closing per @mboersma as this is fixed and/or documented in all current, supported versions.

bridgetkromhout avatar Sep 13 '22 22:09 bridgetkromhout