terraform-google-kubernetes-engine icon indicating copy to clipboard operation
terraform-google-kubernetes-engine copied to clipboard

Node Pools aren't being set w/ automatic node repair, node updates, or Autoscaling

Open lesv opened this issue 7 years ago • 8 comments
trafficstars

My input:

module "gke-cluster" {
  source = "google-terraform-modules/kubernetes-engine/google"
  version = "1.19.1"

  general = {
    name = "${var.cluster_name}"
    env  = "${var.environment}"
    zone = "${var.gcp_zone}"
  }

  master = {
    enable_kubernetes_alpha = true
    username = "admin"
    password = "${random_string.password.result}"
  }

  default_node_pool = {
    node_count = 3
    machine_type = "${var.node_machine_type}"
    disk_size_gb = "${var.node_disk_size}"
    disk_type = "pd-ssd"
    oauth_scopes =   "https://www.googleapis.com/auth/compute,https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management,https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/pubsub,https://www.googleapis.com/auth/datastore"

#    autoscaling {   ## I've tried with both this, and this commented out.
#      min_node_count = 1
#      max_node_count = 10
#    }

#    management {  ##  DEFAULTS to TRUE so it should just work, but it's not on 10/30pm
#     auto_repair = true
#     auto_upgrade= true
#    }
  }

  node_pool = []
}

I get: screen shot 2018-10-30 at 10 46 52 pm

With things commented out looking in terraform.tfstate:

                "google_container_cluster.new_container_cluster": {
                    "type": "google_container_cluster",
                    "depends_on": [
                        "data.google_container_engine_versions.region",
                        "local.name_prefix"
                    ],
                    "primary": {
                        "id": "knative-dev-us-west1-c-master",
                        "attributes": {
 .
 .
 .
                            "id": "knative-dev-us-west1-c-master",
 .
 .
 .
                            "name": "knative-dev-us-west1-c-master",
 .
 .
 .
                            "node_pool.#": "1",
                            "node_pool.0.autoscaling.#": "0",
                            "node_pool.0.initial_node_count": "3",
                            "node_pool.0.instance_group_urls.#": "1",
                            "node_pool.0.instance_group_urls.0": "https://www.googleapis.com/compute/v1/projects/lesv-008/zones/us-west1-c/instanceGroupManagers/gke-knative-dev-us-west1-default-pool-68956134-grp",
                            "node_pool.0.management.#": "1",
                            "node_pool.0.management.0.auto_repair": "false",
                            "node_pool.0.management.0.auto_upgrade": "false",
                            "node_pool.0.max_pods_per_node": "0",
                            "node_pool.0.name": "default-pool",
 .
 .
 ;

I would expect to either set it, or following the comments in the code get that as the default, will look again in the AM incase of operator error, as I'm very much a nube w/ terraform, GKE, and kNative. (Though I've built several clusters by hand)

lesv avatar Oct 31 '18 05:10 lesv

I also tried just setting:

      min_node_count = 1
      max_node_count = 10

      auto_repair = true
      auto_upgrade= true

It failed inside default_node_pool, but worked inside a node_pool.

I tried just creating a single node_pool and commenting out default_node_pool, but that gave me two node pools, where the default had some really bad defaults.

lesv avatar Oct 31 '18 21:10 lesv

So, I tried again, and still no success.

module "gke-cluster" {
  source = "google-terraform-modules/kubernetes-engine/google"
  version = "1.19.1"

  general = {
    name = "${var.cluster_name}"
    env  = "${var.environment}"
    zone = "${var.gcp_zone}"
  }

  master = {
#    enable_kubernetes_alpha = true # disables autoRepair & autoUpdate
    username = "admin"
    password = "${random_string.password.result}"

    disable_kubernetes_dashboard = false
    monitoring_service = "monitoring.googleapis.com"
    maintenance_window = "02:15"
  }

  default_node_pool = {
    node_count = 3
    machine_type = "${var.node_machine_type}"
    disk_size_gb = "${var.node_disk_size}"
    disk_type = "pd-ssd"
    oauth_scopes = "${join(",", var.scopes )}"

    min_node_count = 1
    max_node_count = 10

    auto_repair = true
    auto_upgrade= true
  }
}

lesv avatar Oct 31 '18 22:10 lesv

Currently there is no possibility to activate autoscaling or auto repair on the default node pool on the provider Google ...

Nothing in the doc: https://www.terraform.io/docs/providers/google/r/container_cluster.html#disk_size_gb

And nothing in the code: https://github.com/terraform-providers/terraform-provider-google/blob/51e63bfff2d2acba78bdbb35227669b820a4d61e/google/node_config.go

Personally I often delete the pool default but I think it should be an issue on the provider.

perriea avatar Nov 01 '18 13:11 perriea

I can do it with the gcloud command. (I get the right result) when I do:

gcloud container clusters create $CLUSTER_NAME \
  --zone=$CLUSTER_ZONE \
  --cluster-version=latest \
  --machine-type=n1-standard-4 \
  --enable-autoscaling --min-nodes=1 --max-nodes=10 \
  --enable-autorepair \
  --scopes=service-control,service-management,compute-rw,storage-ro,cloud-platform,logging-write,monitoring-write,pubsub,datastore \
  --num-nodes=3

lesv avatar Nov 01 '18 15:11 lesv

Ah - I think I understand, we need to fix the go code.

lesv avatar Nov 01 '18 15:11 lesv

I ended up switching to the beta provider and using resources directly (and that worked for me):

resource "google_container_cluster" "gke_cluster" {
  name               = "${var.cluster_name}"
  zone               = "${var.gcp_zone}"
  min_master_version = "${var.master_version}"

  master_auth {
    username = "admin"
    password = "${random_string.password.result}"
  }

  addons_config {
    kubernetes_dashboard {
      disabled = false
    }
  }

  logging_service    = "logging.googleapis.com/kubernetes"
  monitoring_service = "monitoring.googleapis.com/kubernetes"

  maintenance_policy {
    daily_maintenance_window {
      start_time = "02:10"
    }
  }

  lifecycle {
    ignore_changes = ["node_pool"]
  }

  node_pool {
    name       = "default-pool"
    node_count = "${var.min_node_count}"

    autoscaling {
      min_node_count = "${var.min_node_count}"
      max_node_count = "${var.max_node_count}"
    }

    management {
      auto_upgrade = true
      auto_repair  = true
    }

    node_config {
      oauth_scopes = "${var.scopes}"

      machine_type = "${var.node_machine_type}"
      disk_size_gb = "${var.node_disk_size}"
      disk_type = "pd-ssd"
    }
  }
}

lesv avatar Nov 02 '18 00:11 lesv

Thank you @lesv, I will look at this on the beta provider to see if I have not missed something on the stable version 👍

perriea avatar Nov 03 '18 09:11 perriea

The standard provider seems to work fine for me with @lesv's solution.

nhooyr avatar Mar 13 '19 18:03 nhooyr