terraform-provider-kubernetes-alpha icon indicating copy to clipboard operation
terraform-provider-kubernetes-alpha copied to clipboard

failed to validate provider configuration still happening

Open thecodeassassin opened this issue 4 years ago • 8 comments

The issue reported here => https://github.com/hashicorp/terraform-provider-kubernetes-alpha/issues/124

is still an issue. When I use 0.2.0 everything works fine, but everything breaks on 0.2.1

 data "google_container_cluster" "gke_cluster" {
  project  = var.gke_project
  name     = var.cluster.name
  location = var.cluster.region
}

data "google_client_config" "provider" {}

provider "kubernetes-alpha" {
  host  = "https://${data.google_container_cluster.gke_cluster.endpoint}"
  token = data.google_client_config.provider.access_token

  client_certificate     = base64decode(data.google_container_cluster.gke_cluster.master_auth[0].client_certificate)
  client_key             = base64decode(data.google_container_cluster.gke_cluster.master_auth[0].client_key)
  cluster_ca_certificate = base64decode(data.google_container_cluster.gke_cluster.master_auth[0].cluster_ca_certificate)
}

resource "kubernetes_manifest" "test-ns" {
  provider = kubernetes-alpha

  manifest = {
    apiVersion = "v1"
    kind       = "Namespace"
    metadata = {
      name = "storage-test"
    }
  }
}

With 0.2.1:

Error: rpc error: code = Unknown desc = failed to validate provider configuration

Terraform version: 0.14.2

thecodeassassin avatar Feb 19 '21 15:02 thecodeassassin

Hi @thecodeassassin.

It is expected that this configuration will not work reliably as Terraform doesn't officially support supplying provider configuration values from other resources or datasources. In the case of this provider in particular, we need to have access to a working API server at plan time (rather than all other providers out there which only require it at apply-time). The reason it was appearing to be working (sometimes) in the earlier version is because of less strict validation.

We've strengthened the validation since then and the up-coming release v0.3.0 will have a hard requirement for the Kubernetes API to be available at plan time due to the fact that we're using the OpenAPI definitions served by the cluster, during our plan phase.

I advise you, in order to be prepared to use upcoming releases of this provider, to break out your configurations in two steps, that apply discretely. First one creating the cluster and supporting infrastructure, the second one applying the Kubernetes manifest resources. This is the only configuration guaranteed to work reliably right now.

If you want to understand the root of the issue, this can be traced back to the way Terraform was designed to work internally, in an era before Kubernetes, where they were not expecting a provider block to take values from another resource.

Here are the historical details if they are interesting to you: https://github.com/hashicorp/terraform/issues/4149

alexsomesan avatar Feb 24 '21 11:02 alexsomesan

Hi @alexsomesan we already do this. The cluster is up and running and this is a separate repo from the clusters itself. It still doesn't work and still gives us this error.

thecodeassassin avatar Feb 24 '21 11:02 thecodeassassin

Terraform doesn't officially support supplying provider configuration values from other resources or datasources

Since when? This is literally the most common method of configuring the helm and/or kubernetes providers and works fairly flawlessly.

davidalger avatar Feb 24 '21 14:02 davidalger

@davidalger my apologies, I don't think I mentioned this but what I meant was that you cannot use different provider credentials in a module. Terraform then throws an error.

Referring to this issue: https://github.com/hashicorp/terraform/issues/24476

thecodeassassin avatar Feb 24 '21 15:02 thecodeassassin

Terraform doesn't officially support supplying provider configuration values from other resources or datasources

Since when? This is literally the most common method of configuring the helm and/or kubernetes providers and works fairly flawlessly.

It's been documented for a while in Terraform's language docs. See here, the paragraph about expressions: https://www.terraform.io/docs/language/providers/configuration.html#provider-configuration-1

alexsomesan avatar Mar 11 '21 22:03 alexsomesan

@thecodeassassin can you try this again with provider version v0.3.1 ? I'd like to find out if this is still an issue after the refactoring.

alexsomesan avatar Mar 11 '21 22:03 alexsomesan

@alexsomesan Can we readdress this assumption that we cannot use dynamic values to configure a provider? Making this into a strict requirement would have a large effect on our team and the Terraform ecosystem at large. I want to say my team really appreciates the direction this Kubernetes provider is moving in. Allowing us to manage custom resources is a huge benefit to us, but introducing this restriction dampens that enthusiasm.

For my team's use case, we have curated a Terraform module that we use for cluster creation and configuration. It uses a combination of kubernetes, helm, rancher2, and aws providers. Up until now we have been able to utilize simple terraform commands like apply and destroy on the one module, in one pass. It makes it easy to maintain and upgrade our clusters, and all our tooling has this assumption that we can create and destroy the full stack in one Terraform operation. Requiring static provider configuration would force us to refactor all our existing cluster Terraform configurations for all our clusters.

More broadly, this precedent would impact existing providers and modules in the Terraform registry, some of which we also use in our team, like the https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest module for creating and upgrading EKS clusters. It uses the kubernetes provider in order to maintain a ConfigMap in the cluster, which it also creates. So when this kubernetes-alpha restriction is merged into the mainline kubernetes provider, it would have a severe impact on that popular module.

I recognize that the limitation aligns with the official Terraform documentation, but it runs counter to industry conventions as we have experienced them. Like @davidalger mentioned already it is perfectly functional and highly useful in these providers already. Making this stand would create a bifurcation in the Terraform provider ecosystem where some providers adhere to the restriction and others do not. For us consumers curating in-house Terraform modules for provisioning infrastructure, the revelation that certain providers and modules require multiple applies adds a lot of required knowledge that undermines the abstraction that Terraform modules gives us.

I would encourage giving some more thought on this before closing this door, since it will have significant downstream effects for Terraform users broadly.

armsnyder avatar Mar 17 '21 22:03 armsnyder

I would like to chime in here that we use this provider to create several kubernetes_* resources. We're running our builds remotely on Terraform Cloud, with a GOOGLE_CREDENTIALS environment variable set.

With this setup, the following works on terraform apply only. After the 1 hour token expires, we see errors when trying to run terraform destroy.

data "google_client_config" "default" {}

data "google_container_cluster" "apps" {
  project  = var.project_id
  name     = var.cluster_name
  location = var.cluster_region
}

provider "kubernetes" {
  host                   = "https://${data.google_container_cluster.apps.endpoint}"
  token                  = data.google_client_config.default.access_token
  cluster_ca_certificate = base64decode(data.google_container_cluster.apps.master_auth[0].cluster_ca_certificate)
}

resource "kubernetes_manifest" "issuer" {
  provider = kubernetes-alpha

  manifest = {
    # manifest
  }
}

We have the following stacks that are applied sequentially:

  1. cd infra-gke; terraform apply: Builds the GKE cluster.
  2. cd infra-cert-manager; terraform apply: Installs the Helm cert-manager chart.
  3. cd infra-cert-manager-issuers; terraform apply: Runs the above.

When we run those three steps, everything works fantastically. If we run terraform destroy inside of infra-cert-manager-issuers within 1 hour of the apply, it destroys fine. If we run terraform destroy after 1 hour, the token from data "google_container_cluster" is expired. Our work around would be to run terraform refresh, but since we're on Terraform Cloud, we have to add a resource "random_integer" "temp" to each stack and run terraform apply; terraform destroy. This is a hacky workaround, but works for now.

If there is a movement to removing dynamic values from the provider "kubernetes" configuration; we will have to abandon Terraform for all manifest and Helm creation.

ellisio avatar Mar 18 '21 04:03 ellisio