terraform-provider-kubernetes icon indicating copy to clipboard operation
terraform-provider-kubernetes copied to clipboard

kubernetes_manifest: Terraform often fails with "http2: server sent GOAWAY and closed the connection"

Open papanito opened this issue 3 years ago • 6 comments

Terraform Version, Provider Version and Kubernetes Version

Terraform v1.3.2
on windows_amd64
+ provider registry.terraform.io/gavinbunney/kubectl v1.14.0
+ provider registry.terraform.io/hashicorp/helm v2.7.1
+ provider registry.terraform.io/hashicorp/kubernetes v2.11.0
+ provider registry.terraform.io/rancher/rancher2 v1.22.2

Affected Resource(s)

Terraform Configuration Files

provider.tf:

terraform {
  required_providers {
    rancher2 = {
      source  = "rancher/rancher2"
      version = "~>1.22.2"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~>2.11.0"
    }
    kubectl = {
      source  = "gavinbunney/kubectl"
      version = "~>1.14.0"
    }
    helm = {
      source  = "hashicorp/helm"
      version = "~>2.7.1"
    }
  }

  backend "azurerm" {
    ....
  }
}

provider "rancher2" {
  api_url    = var.RANCHER_NOP_API_URL
  access_key = var.RANCHER_NOP_TOKEN
  secret_key = var.RANCHER_NOP_SECRET
}

provider "kubernetes" {
  host  = "${var.RANCHER_NOP_API_URL}/k8s/clusters/${rancher2_cluster.cluster.id}"
  token = "${var.RANCHER_NOP_TOKEN}:${var.RANCHER_NOP_SECRET}"
}

provider "kubectl" {
  load_config_file = "false"
  host             = "${var.RANCHER_NOP_API_URL}/k8s/clusters/${rancher2_cluster.cluster.id}"
  token            = "${var.RANCHER_NOP_TOKEN}:${var.RANCHER_NOP_SECRET}"
}

provider "helm" {
  kubernetes {
    host  = "${var.RANCHER_NOP_API_URL}/k8s/clusters/${rancher2_cluster.cluster.id}"
    token = "${var.RANCHER_NOP_TOKEN}:${var.RANCHER_NOP_SECRET}"
  }
}

module/gatekeeper/gatekeeper.tf:

resource "kubernetes_manifest" "opa_config" {
  manifest = {
    apiVersion = "config.gatekeeper.sh/v1alpha1"
    kind = "Config"
    metadata = {
      name = "config"
      namespace = "cattle-gatekeeper-system"
      labels = {
          team = "skywalkers"
      }
    }
    spec = {
      match = [{
        excludedNamespaces = ["kube-*", "cattle-*"]
        processes = ["*"]
      }]
    }
  }
}

Debug Output

Panic Output

N/A

Steps to Reproduce

  1. terraform plan

Expected Behavior

Plan succeeds without error

Actual Behavior

Plan fails with an error like this:

│   with module.gatekeeper.kubernetes_manifest.opa_config,
│   on .terraform\modules\gatekeeper\gatekeeper\main.tf line 1934, in resource "kubernetes_manifest" "opa_config":
│ 1934: resource "kubernetes_manifest" "opa_config" {
│
│ The plugin returned an unexpected error from plugin.(*GRPCProvider).UpgradeResourceState: rpc
│ error: code = Unknown desc = failed to determine resource type ID: cannot get OpenAPI foundry:
│ failed get OpenAPI spec: http2: server sent GOAWAY and closed the connection; LastStreamID=199,
│ ErrCode=NO_ERROR, debug=""

Important Factoids

N/A

References

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

papanito avatar Dec 09 '22 11:12 papanito

This smells like authentication issues, but it's also the first time I've heard of that type of reply from the API server (GOAWAY) 😄

Need to look into potential causes for that error message.

alexsomesan avatar Dec 14 '22 09:12 alexsomesan

Yeah not very friendly, at least a "please" would be nice 😄 Its pretty random, and after it occurred, a subsequent tf plan often succeeds

papanito avatar Dec 14 '22 10:12 papanito

Any update on this? I am facing this issue as well, but i keep getting the same error over and over again. A temporary fix seems to be to destroy and recreate the certificate or do a plan/apply with -refresh=false but these solutions are just temporary hacks

This are my versions:

Terraform v1.4.4
on linux_amd64
+ provider registry.terraform.io/hashicorp/kubernetes v2.19.0

and resources

resource "kubernetes_manifest" "selfsigned-ca-issuer" {
  manifest = {
    apiVersion = "cert-manager.io/v1"
    kind       = "ClusterIssuer"
    metadata   = {
      name = "selfsigned-ca-issuer"
    }
    spec = {
      selfSigned = {}
    }
  }
}

resource "kubernetes_manifest" "selfsigned-star-certificate" {
  manifest = {
    apiVersion = "cert-manager.io/v1"
    kind       = "Certificate"
    metadata   = {
      name      = "selfsigned-star-certificate"
      namespace = "default"
    }
    spec = {
      commonName = "*.${var.base_hostname}"
      dnsNames   = [
        "*.${var.base_hostname}"
      ]
      secretName = "selfsigned-star-certificate"
      privateKey = {
        algorithm = "RSA"
        size      = 4096
      }
      issuerRef = {
        name  = kubernetes_manifest.selfsigned-ca-issuer.manifest.metadata.name
        kind  = "ClusterIssuer"
        group = "cert-manager.io"
      }
    }
  }
}

data "kubernetes_secret_v1" "star-certificate" {
  metadata {
    name      = kubernetes_manifest.selfsigned-star-certificate.manifest.spec.secretName
    namespace = kubernetes_manifest.selfsigned-star-certificate.manifest.metadata.namespace
  }
}

after terraform plan i keep getting

module.services.kubernetes_manifest.selfsigned-ca-issuer: Refreshing state...
module.services.kubernetes_manifest.selfsigned-star-certificate: Refreshing state...

Planning failed. Terraform encountered an error while generating this plan.

╷
│ Error: Plugin error
│ 
│   with module.services.kubernetes_manifest.selfsigned-star-certificate,
│   on services/certificates.tf line 14, in resource "kubernetes_manifest" "selfsigned-star-certificate":
│   14: resource "kubernetes_manifest" "selfsigned-star-certificate" {
│ 
│ The plugin returned an unexpected error from plugin.(*GRPCProvider).PlanResourceChange: rpc error: code = Unknown desc = failed to determine resource type ID: failed to look up GVK [cert-manager.io/v1, Kind=Certificate] among
│ available CRDs: unexpected error when reading response body. Please retry. Original error: http2: server sent GOAWAY and closed the connection; LastStreamID=199, ErrCode=NO_ERROR, debug=""

santimar avatar Apr 17 '23 15:04 santimar

This smells like authentication issues, but it's also the first time I've heard of that type of reply from the API server (GOAWAY) smile

Need to look into potential causes for that error message.

@alexsomesan After some investigation, it seems to be a feature of the api server that can be used when you have a load balancer and multiple control plane nodes.

As you can see here: https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/

One of the parameters is --goaway-chance float

To prevent HTTP/2 clients from getting stuck on a single apiserver, randomly close a connection (GOAWAY). The client's other in-flight requests won't be affected, and the client will reconnect, likely landing on a different apiserver after going through the load balancer again. This argument sets the fraction of requests that will be sent a GOAWAY. Clusters with single apiservers, or which don't use a load balancer, should NOT enable this. Min is 0 (off), Max is .02 (1/50 requests); .001 (1/1000) is a recommended starting point.

I only get this error on kubernetes_manifest resources though, so maybe it needs deeper investigation

santimar avatar Apr 27 '23 16:04 santimar

^ We're getting the same error but for other resources! Has there been a fix for this?

aaj-synth avatar Nov 10 '23 21:11 aaj-synth

@aaj-synth I was able to fix this error by using multiple apiservers and putting a load-balancer in front of the cluster, but also the --goaway-chance 0 should work. I know it's not the fix you are looking for, but it works for now.

santimar avatar Nov 12 '23 11:11 santimar