terraform-aws-eks-blueprints icon indicating copy to clipboard operation
terraform-aws-eks-blueprints copied to clipboard

gavinbunney/kubectl provider used in vpc-cni-custom-networking is behaving inconsistently

Open fbozic opened this issue 2 years ago • 11 comments

Description

There is a bug in gavinbunney/kubectl provider which is used in vpc-cni-custom-networking example. The bug seems to be happening only on K8s version 1.27. Also, it looks like that provider is not maintained anymore (example 1, example 2, example 3).

I was following vpc-cni-custom-networking example and I've made no changes to the relevant code, gavinbunney/kubectl provider and kubectl_manifest resources.

I've created a new EKS v1.27 cluster with terraform apply successfully. Then I ran kubectl get eniconfigs.crd.k8s.amazonaws.com to inspect created ENIConfigs and the manifests looked okay. Then I ran terraform plan to check if terraform state is in sync. Terraform state was not in sync, terraform wanted to create those ENIConfigs. I've attached terraform debug logs which state that ENIConfigs resources are being removed from the state because they couldn't be found. Trying to import back those resources fails. After some time terraform plan will produce no changes again (~20,30 minutes after the last interaction with eniconfigs.crd.k8s.amazonaws.com over kubectl).

I've found a similar solution which is using hashicorp/helm provider and an empty generic helm chart. I can open PR with this change for vpc-cni-custom-networking example if it makes sense to switch providers.

Old code:

resource "kubectl_manifest" "eni_config" {
  for_each = zipmap(var.azs, var.pods_subnet_ids)

  yaml_body = yamlencode({
    apiVersion = "crd.k8s.amazonaws.com/v1alpha1"
    kind       = "ENIConfig"
    metadata = {
      name = each.key
    }
    spec = {
      securityGroups = [
        module.eks.cluster_primary_security_group_id,
        module.eks.node_security_group_id,
      ]
      subnet = each.value
    }
  })
}

New code:

resource "helm_release" "eni_configs" {
  name       = "eni-configs"
  repository = "https://dysnix.github.io/charts"
  chart      = "raw"
  version    = "v0.3.2"
  values = [
    yamlencode({
      resources = [
        for az, subnet_id in zipmap(var.azs, var.pods_subnet_ids) :
        {
          apiVersion = "crd.k8s.amazonaws.com/v1alpha1"
          kind       = "ENIConfig"
          metadata = {
            name = az
          }
          spec = {
            securityGroups = [
              module.eks.cluster_primary_security_group_id,
              module.eks.node_security_group_id,
            ]
            subnet = subnet_id
          }
        }
      ]
    })
  ]
}
  • [x ] ✋ I have searched the open/closed issues and my issue is not listed.

Versions

  • Module version [Required]:
    I'm using only examples from this repository. Here is permalink: https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/a3138bd8ee2dba924e01a3fc265b84402a0246bc/examples/vpc-cni-custom-networking

  • Terraform version:

Terraform v1.5.2
on darwin_arm64
  • Provider version(s):
Terraform v1.5.2
on darwin_arm64
+ provider registry.terraform.io/gavinbunney/kubectl v1.14.0
+ provider registry.terraform.io/hashicorp/aws v5.3.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.3.2
+ provider registry.terraform.io/hashicorp/helm v2.10.1
+ provider registry.terraform.io/hashicorp/kubernetes v2.21.1
+ provider registry.terraform.io/hashicorp/time v0.9.1
+ provider registry.terraform.io/hashicorp/tls v4.0.4

Reproduction Code [Required]

  • terraform apply
  • kubectl get eniconfigs.crd.k8s.amazonaws.com
  • terraform plan -> kubectl resources are dropped from the state

Expected behaviour

terraform plan is consistent every time and ENIConfigs do not disappear from the state.

Actual behaviour

terraform plan is not consistent every time and ENIConfigs disappear from the state.

Terminal Output Screenshot(s)

Terraform plan debug logs (just the interesting part):

2023-07-04T18:42:27.207+0200 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2023/07/04 18:42:27 [WARN] kubernetes resource (/apis/crd.k8s.amazonaws.com/v1alpha1/eniconfigs/eu-central-1b) not found, removing from state
2023-07-04T18:42:27.207+0200 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2023/07/04 18:42:27 [WARN] kubernetes resource (/apis/crd.k8s.amazonaws.com/v1alpha1/eniconfigs/eu-central-1c) not found, removing from state
2023-07-04T18:42:27.207+0200 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2023/07/04 18:42:27 [WARN] kubernetes resource (/apis/crd.k8s.amazonaws.com/v1alpha1/eniconfigs/eu-central-1a) not found, removing from state
2023-07-04T18:42:27.207+0200 [WARN]  Provider "registry.terraform.io/gavinbunney/kubectl" produced an unexpected new value for module.eks_euc1.kubectl_manifest.eni_config["eu-central-1a"] during refresh.
2023-07-04T18:42:27.207+0200 [WARN]  Provider "registry.terraform.io/gavinbunney/kubectl" produced an unexpected new value for module.eks_euc1.kubectl_manifest.eni_config["eu-central-1c"] during refresh.
2023-07-04T18:42:27.207+0200 [WARN]  Provider "registry.terraform.io/gavinbunney/kubectl" produced an unexpected new value for module.eks_euc1.kubectl_manifest.eni_config["eu-central-1b"] during refresh.
2023-07-04T18:42:27.209+0200 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2023/07/04 18:42:27 [DEBUG] eu-central-1b Unstructed YAML: map[apiVersion:crd.k8s.amazonaws.com/v1alpha1 kind:ENIConfig metadata:map[name:eu-central-1b] spec:map[securityGroups:[sg-0329cc8efeae8b0cf sg-0efb5545ec1b733ef] subnet:subnet-073fc7974ff152aa8]]
2023-07-04T18:42:27.209+0200 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2023/07/04 18:42:27 [DEBUG] eu-central-1c Unstructed YAML: map[apiVersion:crd.k8s.amazonaws.com/v1alpha1 kind:ENIConfig metadata:map[name:eu-central-1c] spec:map[securityGroups:[sg-0329cc8efeae8b0cf sg-0efb5545ec1b733ef] subnet:subnet-0200bdef6e567c74f]]
2023-07-04T18:42:27.209+0200 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2023/07/04 18:42:27 [DEBUG] eu-central-1b Unstructed YAML: map[apiVersion:crd.k8s.amazonaws.com/v1alpha1 kind:ENIConfig metadata:map[name:eu-central-1b] spec:map[securityGroups:[sg-0329cc8efeae8b0cf sg-0efb5545ec1b733ef] subnet:subnet-073fc7974ff152aa8]]
2023-07-04T18:42:27.209+0200 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2023/07/04 18:42:27 [DEBUG] eu-central-1c Unstructed YAML: map[apiVersion:crd.k8s.amazonaws.com/v1alpha1 kind:ENIConfig metadata:map[name:eu-central-1c] spec:map[securityGroups:[sg-0329cc8efeae8b0cf sg-0efb5545ec1b733ef] subnet:subnet-0200bdef6e567c74f]]
2023-07-04T18:42:27.209+0200 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2023/07/04 18:42:27 [DEBUG] eu-central-1a Unstructed YAML: map[apiVersion:crd.k8s.amazonaws.com/v1alpha1 kind:ENIConfig metadata:map[name:eu-central-1a] spec:map[securityGroups:[sg-0329cc8efeae8b0cf sg-0efb5545ec1b733ef] subnet:subnet-0c3c1b5745dbdad8c]]
2023-07-04T18:42:27.209+0200 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2023/07/04 18:42:27 [DEBUG] eu-central-1a Unstructed YAML: map[apiVersion:crd.k8s.amazonaws.com/v1alpha1 kind:ENIConfig metadata:map[name:eu-central-1a] spec:map[securityGroups:[sg-0329cc8efeae8b0cf sg-0efb5545ec1b733ef] subnet:subnet-0c3c1b5745dbdad8c]]
2023-07-04T18:42:27.210+0200 [WARN]  Provider "registry.terraform.io/gavinbunney/kubectl" produced an invalid plan for module.eks_euc1.kubectl_manifest.eni_config["eu-central-1c"], but we are tolerating it because it is using the legacy plugin SDK.
2023-07-04T18:42:27.210+0200 [WARN]  Provider "registry.terraform.io/gavinbunney/kubectl" produced an invalid plan for module.eks_euc1.kubectl_manifest.eni_config["eu-central-1a"], but we are tolerating it because it is using the legacy plugin SDK.
2023-07-04T18:42:27.210+0200 [WARN]  Provider "registry.terraform.io/gavinbunney/kubectl" produced an invalid plan for module.eks_euc1.kubectl_manifest.eni_config["eu-central-1b"], but we are tolerating it because it is using the legacy plugin SDK.
2023-07-04T18:42:27.211+0200 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/gavinbunney/kubectl/1.14.0/darwin_arm64/terraform-provider-kubectl_v1.14.0 pid=16928
2023-07-04T18:42:27.834+0200 [DEBUG] Resource state not found for node "module.eks_euc1.kubectl_manifest.eni_config[\"eu-central-1c\"]", instance module.eks_euc1.kubectl_manifest.eni_config["eu-central-1c"]
2023-07-04T18:42:27.834+0200 [DEBUG] Resource state not found for node "module.eks_euc1.kubectl_manifest.eni_config[\"eu-central-1a\"]", instance module.eks_euc1.kubectl_manifest.eni_config["eu-central-1a"]
2023-07-04T18:42:27.834+0200 [DEBUG] Resource state not found for node "module.eks_euc1.kubectl_manifest.eni_config[\"eu-central-1b\"]", instance module.eks_euc1.kubectl_manifest.eni_config["eu-central-1b"]

Additional context

fbozic avatar Jul 06 '23 07:07 fbozic

I too was severely hit by this. the recommended approach is to move away from that provider and instead use manifests. i tried but had trouble to get ENIConfig to deploy

FernandoMiguel avatar Jul 06 '23 10:07 FernandoMiguel

I too was severely hit by this. the recommended approach is to move away from that provider and instead use manifests. i tried but had trouble to get ENIConfig to deploy

Have you tried helm_release? I've tested it on a new cluster and everything works as expected. Unlike official kubernetes, helm provider doesn't require CRDs to be available during plan time.

If this is an okay approach I would be happy to make a PR with changes.

fbozic avatar Jul 10 '23 14:07 fbozic

helm_release is somewhat limited (you either go with any-resource helm chart or you face the limitations around templating) and has some other bugs linked to it as well (sporadic random changes for example). Original linked issue has been adressed in my fork (https://github.com/alekc/terraform-provider-kubectl/releases/tag/v2.0.2), you can replace provider definition and have a go, it should be completely compatible (and as a side effect several other bugs like wait for deletion has been fixed and released as well).

alekc avatar Jul 22 '23 12:07 alekc

@fbozic I've been experimenting with helm_release and it has worked out quite well. There might be hidden issues I have not uncovered but so far so good. I am inclined to removing kubectl and replacing with helm_release and a local chart. Let me know if you are still open to a PR with a caveat that it would have to done across the board for all examples we have today in this repo.

askulkarni2 avatar Oct 04 '23 15:10 askulkarni2

@askulkarni2 I'll try to open a PR next week once I'm back from PTO.

fbozic avatar Oct 05 '23 12:10 fbozic

Just wondering... Anything preventing to use the kubernetes_manifest from the official kubernetes provider https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/manifest ? 🤔

barryib avatar Oct 26 '23 09:10 barryib

Just wondering... Anything preventing to use the kubernetes_manifest from the official kubernetes provider https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/manifest ? 🤔

CRD has to be on a cluster during the terraform plan. That is not the case with ENIConfig during the first terraform run.

fbozic avatar Oct 26 '23 09:10 fbozic

Oh right. Thanks @fbozic.

barryib avatar Oct 26 '23 20:10 barryib

We will be working on updating the examples provided here shortly. The approach we will be using is as follows:

  • If the manifest is used simply as a demonstration, it will be converted into standard YAML file and the pattern's instruction updated to reflect a kubectl apply -f <file>.yaml
  • If the manifest is required for the pattern, it will be converted into a local Helm chart so that the helm_release resource can be utilized to provision. To be fair, this is a workaround, but it has shown to work fairly well and it fits in with the EKS Blueprints module which deploys via the helm_release resource

bryantbiggs avatar Oct 26 '23 21:10 bryantbiggs

@bryantbiggs I've started working on a fix for this in the forked repo. Are you going to pick this up and should I stop working on that?

fbozic avatar Oct 27 '23 08:10 fbozic

We ended up just applying the cni configs in the inputs for the EKS addon https://github.com/defenseunicorns/terraform-aws-eks/blob/main/main.tf#L79-L121

additional logic here to create them conditionally but the idea is you can feed them into cluster_addons.vpc_cni.configuration_values.eniconfig instead of doing it from terraform

zack-is-cool avatar Nov 17 '23 22:11 zack-is-cool