terraform-provider-flux icon indicating copy to clipboard operation
terraform-provider-flux copied to clipboard

Flux-system namespace in terminating state

Open iuriipro opened this issue 3 years ago • 27 comments

Hi. We are using generic git repo and tried to install flux v2 with help of terrafrom provider. We have done it successfully, but we wanted to have some mechanism for an update of a flux v2 with help of terraform, we don't push the flux manifests which were generated with terraform provider to the repo. Idea was to change the version of flux in the "flux_install" data source, terraform will track the change of version and will perform an updated of flux with newly generated manifests. But during this procedure flux-system namespace will be removed too and going to be in "Terminating" state. But why, it getting some labels, yes, but in this namespace can be stored anotther resources, secrets for example. Can you please suggest, what we can do in this case, or behavior can be changed a little?

iuriipro avatar Feb 01 '21 13:02 iuriipro

Hi @iuriipro, we encountered the same issue. The problem is, that the generated kubernetes resources of data.flux_install include the flux-system namespace. This isn't necessary, because the namespace is already created by the kubernetes_namespace.flux_system resource. Therefore, we simply filtered out the namespace from the list-output of data.kubectl_file_documents:

locals {
  flux_apply_yaml_documents_without_namespace = [for x in data.kubectl_file_documents.apply.documents: x if length(regexall("kind: Namespace", x)) == 0]
}

The flux_apply_yaml_documents_without_namespace is the new input for the kubectl_manifest.apply resourtce.

Now if you do update the flux_install config, this won't lead anymore into the namespace being deleted.

I hope this helps. Best regards Robert

bobrossthepainter avatar Feb 01 '21 14:02 bobrossthepainter

Hi @iuriipro, we encountered the same issue. The problem is, that the generated kubernetes resources of data.flux_install include the flux-system namespace. This isn't necessary, because the namespace is already created by the kubernetes_namespace.flux_system resource. Therefore, we simply filtered out the namespace from the list-output of data.kubectl_file_documents:

locals {
  flux_apply_yaml_documents_without_namespace = [for x in data.kubectl_file_documents.apply.documents: x if length(regexall("kind: Namespace", x)) == 0]
}

The flux_apply_yaml_documents_without_namespace is the new input for the kubectl_manifest.apply resourtce.

Now if you do update the flux_install config, this won't lead anymore into the namespace being deleted.

I hope this helps. Best regards Robert

Thank you, the good point will recheck it)

iuriipro avatar Feb 01 '21 14:02 iuriipro

@iuriipro Terraform should not remove the namespace when updating the version of Flux. Could you please post the version of the provider you are using and the specific HCL that is causing this issue?

phillebaba avatar Feb 01 '21 21:02 phillebaba

@iuriipro Terraform should not remove the namespace when updating the version of Flux. Could you please post the version of the provider you are using and the specific HCL that is causing this issue? Hi, i have used flux provider with 0.0.10 version and HCL from the example from terraform documentation. As i see something was changed with version 0.0.11. Thanks.

iuriipro avatar Feb 02 '21 10:02 iuriipro

@phillebaba Could you please clarify, did you faced with this issue, when using flux terraform provider with version 0.0.11?

Error: Invalid for_each argument

  on ../../modules/flux_v2/main.tf line 60, in resource "kubectl_manifest" "install":
  60:   for_each   = { for v in local.install : lower(join("/", compact([v.data.apiVersion, v.data.kind, lookup(v.data.metadata, "namespace", ""), v.data.metadata.name]))) => v.content }

The "for_each" value depends on resource attributes that cannot be determined
until apply, so Terraform cannot predict how many instances will be created.
To work around this, use the -target argument to first apply only the
resources that the for_each depends on.

iuriipro avatar Feb 02 '21 11:02 iuriipro

So the latest version of the provider changed the guide for the better, hopefully. It should solve the majority of issues people are seeing with resources being recreated which is the root to most problems regarding namespaces stuck in deletion state. I wish I would have solved the issue immediately but it took some experimenting for the solution to become obvious.

Are you getting this error after copying the guide in the docs? Could you post all the Terraform as I cant really find the problem otherwise. This is a Terraform core error caused when you use computed properties as the input to the data source.

phillebaba avatar Feb 02 '21 12:02 phillebaba

@phillebaba Thank you. I understand that it is just terraform core error and behavior of "for_each", but I am following by example from the documentation.

data "flux_install" "main" {
  target_path = "prod"
  version     = var.flux_v2_version
}

data "flux_sync" "main" {
  target_path = "prod"
  url         = "ssh://[email protected]/${var.repository_name}"
  branch      = var.branch
}

resource "kubernetes_namespace" "flux_system" {
  metadata {
    name = "flux-system"
  }

  lifecycle {
    ignore_changes = [
      metadata[0].labels,
    ]
  }
}

data "kubectl_file_documents" "install" {
  content = data.flux_install.main.content
}

data "kubectl_file_documents" "sync" {
  content = data.flux_sync.main.content
}

locals {
  install = [for v in data.kubectl_file_documents.install.documents : {
    data : yamldecode(v)
    content : v
    }
  ]
  sync = [for v in data.kubectl_file_documents.sync.documents : {
    data : yamldecode(v)
    content : v
    }
  ]
}

resource "kubectl_manifest" "install" {
  for_each   = { for v in local.install : lower(join("/", compact([v.data.apiVersion, v.data.kind, lookup(v.data.metadata, "namespace", ""), v.data.metadata.name]))) => v.content }
  depends_on = [kubernetes_namespace.flux_system]
  yaml_body  = each.value
}

resource "kubectl_manifest" "sync" {
  for_each   = { for v in local.sync : lower(join("/", compact([v.data.apiVersion, v.data.kind, lookup(v.data.metadata, "namespace", ""), v.data.metadata.name]))) => v.content }
  depends_on = [kubernetes_namespace.flux_system]
  yaml_body  = each.value
}

iuriipro avatar Feb 02 '21 12:02 iuriipro

Could you follow this guide and see if you get the same issues? https://registry.terraform.io/providers/fluxcd/flux/latest/docs/guides/github

phillebaba avatar Feb 02 '21 13:02 phillebaba

Could you follow this guide and see if you get the same issues? https://registry.terraform.io/providers/fluxcd/flux/latest/docs/guides/github

@phillebaba Thank you, I have tried it before, will try one more time, but I am not sure that it will work correctly.

iuriipro avatar Feb 02 '21 14:02 iuriipro

@phillebaba have tried it, doesn't work. Have you any suggestions? Or need to try 0.0.10?https://www.terraform.io/docs/language/meta-arguments/for_each.html#limitations-on-values-used-in-for_each

iuriipro avatar Feb 02 '21 15:02 iuriipro

I dont understand? I ran the guide from a clean state right now without any issues, I even tested it with Terraform 0.13 to make sure that was not an issue. This was done with the latest provider.

phillebaba avatar Feb 02 '21 15:02 phillebaba

I dont understand? I ran the guide from a clean state right now without any issues, I even tested it with Terraform 0.13 to make sure that was not an issue. This was done with the latest provider.

I ran following the guide, using terraform 0.14.4. Have tried many times. Examples from version 0.0.10 worked successfully. Will recheck 0.0.11 again, maybe will rewrite something. Thank you.

iuriipro avatar Feb 02 '21 16:02 iuriipro

@phillebaba Found that usage of depends_on from some module in the flux_v2 module section caused the issue. Now after the last retry, version 0.0.11 works fine and looks good for me. Thank you.

iuriipro avatar Feb 03 '21 08:02 iuriipro

@iuriipro . Could you please share the final TF file ? This is my config

terraform {
  required_version = ">= 0.14"
  required_providers {
    kubectl = {
      source = "gavinbunney/kubectl"
      version = "1.9.4"
    }
    flux = {
      source  = "fluxcd/flux"
      version = "0.0.11"
    }
  }
}

provider "flux" {
  alias = "gitops"
}

provider "kubectl" {
  alias = "shoot"
}

provider "kubernetes" {
  alias = "shoot"
}

provider "github" {
  alias = "gitops"
}

data "github_repository" "gitops-demo" {
  provider = github.gitops
  full_name = join("/", [ var.github_org, var.repository_name])
}

# Flux
data "flux_install" "master" {
  provider = flux.gitops
  target_path = var.target_path
}

data "flux_sync" "master" {
  provider = flux.gitops
  target_path = var.target_path
  url         = data.github_repository.gitops-demo.http_clone_url
  branch      = var.branch
}

# Kubernetes
resource "kubernetes_namespace" "flux_system" {
  provider = kubernetes.shoot
  metadata {
    name = "flux-system"
  }

  lifecycle {
    ignore_changes = [
      metadata[0].labels,
    ]
  }
}

data "kubectl_file_documents" "install" {
  provider = kubectl.shoot
  content = data.flux_install.master.content
}

data "kubectl_file_documents" "sync" {
  provider = kubectl.shoot
  content = data.flux_sync.master.content
}

locals {
  install = [ for v in data.kubectl_file_documents.install.documents : {
      data: yamldecode(v)
      content: v
    }
  ]
  sync = [ for v in data.kubectl_file_documents.sync.documents : {
      data: yamldecode(v)
      content: v
    }
  ]
}

resource "kubectl_manifest" "install" {
  provider = kubectl.shoot
  for_each   = { for v in local.install : lower(join("/", compact([v.data.apiVersion, v.data.kind, lookup(v.data.metadata, "namespace", ""), v.data.metadata.name]))) => v.content }
  depends_on = [kubernetes_namespace.flux_system]
  yaml_body = each.value
}

resource "kubectl_manifest" "sync" {
  provider = kubectl.shoot
  for_each   = { for v in local.sync : lower(join("/", compact([v.data.apiVersion, v.data.kind, lookup(v.data.metadata, "namespace", ""), v.data.metadata.name]))) => v.content }
  depends_on = [kubernetes_namespace.flux_system]
  yaml_body = each.value
}

resource "kubernetes_secret" "master" {
  provider = kubernetes.shoot
  depends_on = [kubectl_manifest.install]

  metadata {
    name      = data.flux_sync.master.namespace
    namespace = data.flux_sync.master.namespace
  }

  data = {
    password   = var.github_token
    username   = var.github_owner
  }
}

# Github
resource "github_repository_file" "install" {
  provider = github.gitops
  repository = var.repository_name
  file       = data.flux_install.master.path
  content    = data.flux_install.master.content
  branch     = var.branch
}

resource "github_repository_file" "sync" {
  provider = github.gitops
  repository = var.repository_name
  file       = data.flux_sync.master.path
  content    = data.flux_sync.master.content
  branch     = var.branch
}

resource "github_repository_file" "kustomize" {
  provider = github.gitops
  repository = var.repository_name
  file       = data.flux_sync.master.kustomize_path
  content    = data.flux_sync.master.kustomize_content
  branch     = var.branch
}

output "gitops-demo-repo-url" {
  value = data.github_repository.gitops-demo.http_clone_url
  description = "gitops demo repo url"
}

linuxbsdfreak avatar Feb 04 '21 11:02 linuxbsdfreak

@linuxbsdfreak Hi, could you please clarify, what issue do you have? For the first look, your config is ok.

iuriipro avatar Feb 04 '21 14:02 iuriipro

Hi @iuriipro ,

Issue is the same with flux-system namespace in terminating state when deprovisioning with the provider. I assumed you had solved it cleanly

Kevin

linuxbsdfreak avatar Feb 04 '21 19:02 linuxbsdfreak

I also had problem with namespace being stuck terminated, after all other resources were destroyed by Terraform, namespace was stuck, I solved by removing the only finalizer in list:

kubectl edit kustomizations.kustomize.toolkit.fluxcd.io
# it is
  finalizers:
  - finalizers.fluxcd.io
    generation: 1
# should be
  finalizers:
  generation: 1

d47zm3 avatar Apr 09 '21 07:04 d47zm3

I also had problem with namespace being stuck terminated, after all other resources were destroyed by Terraform, namespace was stuck, I solved by removing the only finalizer in list:

kubectl edit kustomizations.kustomize.toolkit.fluxcd.io
# it is
  finalizers:
  - finalizers.fluxcd.io
    generation: 1
# should be
  finalizers:
  generation: 1

Thank you, I know about it, but it not a very good case for automatization.

iuriipro avatar Apr 09 '21 08:04 iuriipro

Hi @iuriipro , Was the problem fixed for you? I still have this issue even in the latest version. The namespace cannot be deleted without removing the finalizer. Which is not good as it leaves dangling resources.

flux = { source = "fluxcd/flux" version = ">= 0.0.13" }

hieumoscow avatar Jun 07 '21 23:06 hieumoscow

Thanks @bobrossthepainter for the idea.

  flux_apply_yaml_documents_without_namespace = [for x in local.install: x if x.data.kind != "Namespace"]

snahelou avatar Jul 29 '21 09:07 snahelou

@snahelou can you please paste your entire config, I'm still having this issue and I'd like to see this working line in context

throwawayaccount0153 avatar Nov 23 '21 18:11 throwawayaccount0153

I have same issue on my fluxcd. where do u update this ?

Kevinwoolworth avatar Mar 24 '22 04:03 Kevinwoolworth

I have added condition for flux template, my namespace created independently. Still have same issue with flux kustomize deletion.

resource "kubectl_manifest" "install" {
  for_each   = {
  for v in local.install : lower(join("/", compact([
    v.data.apiVersion, v.data.kind, lookup(v.data.metadata, "namespace", ""), v.data.metadata.name
  ]))) => v.content if v.data.kind !="Namespace"
  }
  depends_on = [kubernetes_namespace.flux_system]
  yaml_body  = each.value
}

In this topic (https://github.com/fluxcd/kustomize-controller/issues/666) I got proposal regarding manual deletion of flux tru flux uninstall, but this could brake my TF state.

qspors avatar May 27 '22 16:05 qspors

Same issue here

when I try to destroy the terraform infrastructure, it hangs destroying the namespace (I have to manually delete the finalizers for gitrepo and kustomize). To avoid this, I'm using a "null_resource" to create the namespace:

#resource "kubernetes_namespace" "flux_system" {
#  metadata {
#    name = "flux-system"
#  }
#
#  lifecycle {
#    ignore_changes = [
#      metadata[0].labels,
#    ]
#  }
#}

resource "null_resource" "create_namespace_flux_system" {
  triggers = {
    // fire any time the cluster is update in a way that changes its endpoint or auth
    endpoint = module.eks.cluster_endpoint
    ca_crt   = base64decode(module.eks.cluster_certificate_authority_data)
    #token    = data.aws_eks_cluster_auth.default.token
  }

  provisioner "local-exec" {
    command = <<EOH
cat >/tmp/ca.crt <<EOF
${base64decode(module.eks.cluster_certificate_authority_data)}
EOF
KUBECTL_LATEST=$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt) 
curl -LO https://storage.googleapis.com/kubernetes-release/release/$KUBECTL_LATEST/bin/linux/amd64/kubectl && chmod +x ./kubectl
./kubectl \
  --server="${module.eks.cluster_endpoint}" \
  --token="${data.aws_eks_cluster_auth.default.token}" \
  --certificate_authority=/tmp/ca.crt \
  create ns flux-system || /bin/true
rm -f /tmp/ca.crt || /bin/true
rm -f kubectl || /bin/true
EOH
  }
}

Now I can use terraform destroy without the issue (the cluster is destroyed with the namespace)

Do you guys have another workaround?

juancarlosm avatar Sep 02 '22 09:09 juancarlosm

Do you guys have another workaround?

Yes, a better option is to run flux uninstall that knows how to remove things in order without touching any deployed workloads.

stefanprodan avatar Sep 02 '22 10:09 stefanprodan

Thanks for your comments @stefanprodan

Now I'm using flux uninstall inside terraform this way:

resource "local_sensitive_file" "kubeconfig" {
  content = <<-EOF
apiVersion: v1
kind: Config
current-context: terraform
clusters:
- cluster:
    certificate-authority-data: ${module.eks.cluster_certificate_authority_data}
    server: ${module.eks.cluster_endpoint}
  name: ${data.aws_eks_cluster_auth.default.name}
contexts:
- context:
    cluster: ${data.aws_eks_cluster_auth.default.name}
    user: terraform
  name: terraform
users:
- name: terraform
  user:
    token: ${data.aws_eks_cluster_auth.default.token}
EOF
  filename = "./auth/kubeconfig"
  file_permission = "0600"
  directory_permission = "0755"
}

# Create namespace
resource "kubernetes_namespace" "flux_system" {
  metadata {
    name = "flux-system"
  }

  provisioner "local-exec" {
    when    = destroy
    command = <<EOH
curl -s https://fluxcd.io/install.sh | /bin/bash
flux --kubeconfig=./auth/kubeconfig uninstall --namespace=flux-system --silent --keep-namespace --verbose
EOH
  }

  lifecycle {
    ignore_changes = [
      metadata[0].labels,
    ]
  }

  depends_on = [local_sensitive_file.kubeconfig]
}

And is working fine:

terraform destroy:

kubernetes_namespace.flux_system: Destroying... [id=flux-system]
kubernetes_namespace.flux_system: Provisioning with 'local-exec'...
kubernetes_namespace.flux_system (local-exec): Executing: ["/bin/sh" "-c" "curl -s https://fluxcd.io/install.sh | /bin/bash\nflux --kubeconfig=./auth/kubeconfig uninstall --namespace=flux-system --silent --keep-namespace --verbose\nrm -f /usr/local/bin/flux || /bin/true\n"]
kubernetes_namespace.flux_system (local-exec): [INFO]  Downloading metadata https://api.github.com/repos/fluxcd/flux2/releases/latest
kubernetes_namespace.flux_system (local-exec): [INFO]  Using 0.33.0 as release
kubernetes_namespace.flux_system (local-exec): [INFO]  Downloading hash https://github.com/fluxcd/flux2/releases/download/v0.33.0/flux_0.33.0_checksums.txt
kubernetes_namespace.flux_system (local-exec): [INFO]  Downloading binary https://github.com/fluxcd/flux2/releases/download/v0.33.0/flux_0.33.0_linux_amd64.tar.gz
kubernetes_namespace.flux_system (local-exec): [INFO]  Verifying binary download
kubernetes_namespace.flux_system (local-exec): [INFO]  Installing flux to /usr/local/bin/flux
kubernetes_namespace.flux_system (local-exec): ► deleting components in flux-system namespace
kubernetes_namespace.flux_system (local-exec): ► deleting toolkit.fluxcd.io finalizers in all namespaces
kubernetes_namespace.flux_system (local-exec): ✔ Kustomization/flux-system/flux-system finalizers deleted
kubernetes_namespace.flux_system (local-exec): ► deleting toolkit.fluxcd.io custom resource definitions
kubernetes_namespace.flux_system (local-exec): ✔ uninstall finished
kubernetes_namespace.flux_system: Still destroying... [id=flux-system, 10s elapsed]
kubernetes_namespace.flux_system: Destruction complete after 14s

juancarlosm avatar Sep 05 '22 14:09 juancarlosm