terraform-provider-kubernetes rebase on upstream 1.4.0 or cherry-pick important fixes?

The kubernetes_deployment resource is fairly painful to use without https://github.com/terraform-providers/terraform-provider-kubernetes/issues/210, and requires running tf multiple times for a deployment to converge. The fix is part of the upstream 1.4.0 release. Are there plans to rebase this module upon upstream "soonish" or is it preferred to try and cherry-pick/back-port critical fixes from upstream?

Jan 09 '19 15:01 jhoblitt

Hi @jhoblitt, at this stage I don't intend to rebase this provider on upstream. This fork has diverged quite considerably from upstream and I don't see the reconciliation effort as worthwhile right now. I'm optimistic upstream will catch up with the features in this provider and this fork can be abandoned. So, for the time being cherry-picking fixes from usptream will be the way to go.

Also, I'm curious how does issue #210 manifest? In the 1.5 years of using kubernetes_deployment resource with this provider I've not seen the kind of issue you describe?! Our deploys all work in a single apply.

Jan 09 '19 17:01 sl1pm4t

@sl1pm4t I also have been hoping that upstream will pickup most of the additional resource types but I'm in a bind as I need the ingress type and am trying to avoid having to maintain an internal fork.

In fairness, I haven't yet tried to cherry-pick erraform-providers#210 to see if it resolves the problem I'm seeing but it definitely isn't present with upstream 1.4.0 (note that changing between this fork and upstream also requires a minor change to the deployment syntax, which is a frustration).

An example of a failure is using a module to install tiller for the helm provider and then trying to use helm resources. This will fail on at least the first tf run as the the helm provider will try to talk to the tiller pod before the rs/pods have finished provisioning. If the docker image pull is slow or the k8s cluster is busy, sometimes a second tf run is too fast and will fail again. This is on top of a strange error from the kubernetes_deployment resource itself even though the deploy is properly created. Eg.

https://github.com/lsst-sqre/terraform-gitlfs/blob/800eae562de6f698936f5d5498ee01dfd55bb822/tf/main.tf#L59-L78

module "tiller" {
  source = "git::https://github.com/lsst-sqre/terraform-tinfoil-tiller.git//?ref=sl1pm4t-1.3.0"

  namespace       = "kube-system"
  service_account = "tiller"
  tiller_image    = "gcr.io/kubernetes-helm/tiller:v2.11.0"
}

provider "helm" {
  version = "~> 0.7.0"

  service_account = "${module.tiller.service_account}"
  namespace       = "${module.tiller.namespace}"
  install_tiller  = false

  kubernetes {
    host                   = "${module.gke.host}"
    cluster_ca_certificate = "${base64decode(module.gke.cluster_ca_certificate)}"
  }
}

First run error on a 1.11.5-gke.5 cluster:

Error: Error applying plan:

2 error(s) occurred:

* module.tiller.kubernetes_deployment.tiller_deploy: 1 error(s) occurred:

* kubernetes_deployment.tiller_deploy: an error on the server ("service unavailable") has prevented the request from succeeding
* module.nginx_ingress.helm_release.nginx_ingress: 1 error(s) occurred:

* helm_release.nginx_ingress: error creating tunnel: "could not find tiller"

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.


[terragrunt] 2019/01/09 11:16:48 Detected 1 Hooks
[terragrunt] 2019/01/09 11:16:48 Hit multiple errors:
exit status 1

Jan 09 '19 18:01 jhoblitt

Fairly sure that's a terraform limitation. Providers can't take data from resources and work the first run.

Apr 01 '19 14:04 jimmiebtlr

terraform-provider-kubernetes terraform-provider-kubernetes copied to clipboard

rebase on upstream 1.4.0 or cherry-pick important fixes?

terraform-provider-kubernetes
terraform-provider-kubernetes copied to clipboard