terraform-provider-kustomization icon indicating copy to clipboard operation
terraform-provider-kustomization copied to clipboard

json error during kustomizationResourceDiff

Open wpbeckwith opened this issue 2 years ago • 16 comments

We have a terraform config like

# Other modules and terraform
... 
module "foo" {
  helm_version = "1.2.0"
...
}

module "bar" {
  # this module uses the kbst provider
  depends_on [
    module.foo.release_metdata
  ]
}

The bar module, which installs fluxcd.io, does not use the any output of the foo module, it just has a dependency on 1 output to ensure that the foo module completes 1st. Now when the terraform config is 1st created all is well. And all subsequent plans are fine too, until I update the version # in the foo module to say 1.2.1. Then the terraform plan will always fails with errors like

╷
│ Error: github.com/kbst/terraform-provider-kustomize/kustomize.kustomizationResourceDiff: github.com/kbst/terraform-provider-kustomize/kustomize.(*kManifest).load: json error: unexpected end of JSON input
│ 
│   with module.cpu_dev2_eks.module.eks_cluster_init.module.fluxcd.kustomization_resource.p0["_/Namespace/_/flux-system"],
│   on .terraform/modules/cpu_dev2_eks.eks_cluster_init.fluxcd/fluxcd/main.tf line 59, in resource "kustomization_resource" "p0":
│   59: resource "kustomization_resource" "p0" {
│ 
╵
╷
│ Error: github.com/kbst/terraform-provider-kustomize/kustomize.kustomizationResourceDiff: github.com/kbst/terraform-provider-kustomize/kustomize.(*kManifest).load: json error: unexpected end of JSON input
│ 
│   with module.cpu_dev2_eks.module.eks_cluster_init.module.fluxcd.kustomization_resource.p0["apiextensions.k8s.io/CustomResourceDefinition/_/providers.notification.toolkit.fluxcd.io"],
│   on .terraform/modules/cpu_dev2_eks.eks_cluster_init.fluxcd/fluxcd/main.tf line 59, in resource "kustomization_resource" "p0":
│   59: resource "kustomization_resource" "p0" {
│ 
╵
╷
│ Error: github.com/kbst/terraform-provider-kustomize/kustomize.kustomizationResourceDiff: github.com/kbst/terraform-provider-kustomize/kustomize.(*kManifest).load: json error: unexpected end of JSON input
│ 
│   with module.cpu_dev2_eks.module.eks_cluster_init.module.fluxcd.kustomization_resource.p0["apiextensions.k8s.io/CustomResourceDefinition/_/alerts.notification.toolkit.fluxcd.io"],
│   on .terraform/modules/cpu_dev2_eks.eks_cluster_init.fluxcd/fluxcd/main.tf line 59, in resource "kustomization_resource" "p0":
│   59: resource "kustomization_resource" "p0" {
│ 
╵
...

This continues until I set the version in the foo module back to the original 1.2.0 and then the plans will successfully complete.

wpbeckwith avatar Jan 13 '23 09:01 wpbeckwith

I need a config to reproduce the issue to look into it.

pst avatar Jan 22 '23 10:01 pst

Sure, I'll see if I can slim down our current config to a reproducible case.

wpbeckwith avatar Jan 25 '23 07:01 wpbeckwith

Also getting blocked by this error, I've narrowed it down to empty dm (new value for manifest) string in here: https://github.com/kbst/terraform-provider-kustomization/blob/1957b1eccaadadec5709675648a1860d67f6e56b/kustomize/resource_kustomization.go#L185-L191

nazarewk avatar Jan 27 '23 14:01 nazarewk

not sure if it's way to go, have no time to dig into the root cause, but code changes in my PR fixed the issue without breaking anything (getting plan as expected)

nazarewk avatar Jan 27 '23 14:01 nazarewk

I haven't narrowed this done to a smaller case ATM, been OOO for bereavement. However, I did discover by accident that when I tried to run a terraform plan several days later it worked. Makes me think there is some type of caching being done and after a while the cache expires and a full set of data for the diff is sent.

wpbeckwith avatar Jan 28 '23 07:01 wpbeckwith

@nazarewk what we would need to figure out is if modified being empty is an expected state in Terraform or if there is some error causing this. If there is an error that causes modified to be empty, either in the manifests themselves or somewhere else along the way, just returning nil on the diff seems like a dangerous approach.

pst avatar Feb 02 '23 11:02 pst

I am quite sure it is caused by -> (known after apply), but need to somehow confirm it

Note I'm having a little exotic workflow:

  1. I render the kustomize without arguments to discover the object IDs
  2. I render the kustomize with patches based on resources
  3. I use pre-rendered object IDs and fully rendered manifests in a resource
code
module "prerender" {
  source = "../../../modules/k8s/kustomization-overlay"

  # can prerender, because patches won't change set of resource identifiers
  resources = [
    "${path.module}/k8s",
  ]
}

module "source" {
  source = "../../../modules/k8s/kustomization-overlay"

  resources = [
    "${path.module}/k8s",
  ]
  patches = concat(values(module.irsa.kustomize_patches), [
    {
      target : { "kind" : "ClusterIssuer" },
      patch : jsonencode([
        {
          op   = "add"
          path = "/spec/acme/solvers"
          value = [
            # see https://cert-manager.io/docs/configuration/acme/dns01/route53/#creating-an-issuer-or-clusterissuer
            {
              selector = { dnsZones = [for zone_id, zone in data.aws_route53_zone.these : zone.name] }
              dns01 = { route53 = {
                region = var.aws_region
              } }
            }
          ]
        }
      ])
    }
  ])
}

module "kustomization-apply" {
  source = "../../../modules/k8s/kustomization-apply"

  ids_prio  = module.prerender.ids_prio
  manifests = module.source.manifests
}

nazarewk avatar Feb 09 '23 12:02 nazarewk

actually with my "fix" it did not update the values until the next run, so while it might be caused by (known after apply) there seems to be more to it

nazarewk avatar Feb 09 '23 12:02 nazarewk

I've also noticed that current version does not notice manual changes done to resources showing no diff at all.

edit: looks like it was already reported at #94

nazarewk avatar Feb 15 '23 11:02 nazarewk

I've also noticed that current version does not notice manual changes done to resources showing no diff at all.

If the change did not update the annotation yes, that's a known limitation of the provider.

pst avatar Feb 15 '23 16:02 pst

I can confirm with 90% certainty, that the error occurs when values are -> (known after apply) - cross-checked plans for my "fixed" version and upstream

nazarewk avatar Feb 17 '23 09:02 nazarewk

I am running into this same issue. Is there a workaround I can look at?

CraigStuart avatar Nov 21 '23 16:11 CraigStuart

not really, I used my locally built patch to work around it https://github.com/kbst/terraform-provider-kustomization/pull/221

nazarewk avatar Nov 21 '23 16:11 nazarewk

A workaround is to do targeted applies to resources that the broken resource depends upon. We frequently run into this issue and fixing this manually all the time is tedious. :(

pascal-hofmann avatar May 15 '24 08:05 pascal-hofmann

I created a repo for reproducing this issue: https://github.com/pascal-hofmann/terraform-provider-kustomization-reproduction-219

Other terraform providers treat this case like this:

	if !d.NewValueKnown("manifest") {
		// value is not known yet, so we can't diff anything.
		return nil
	}

I'm not sure how to live with the missing validation in this case, but I also wonder if CustomizeDiff is the right place for this in the first place.

Edit: Seems like it is:

	// CustomizeDiff is called after a difference (plan) has been generated
	// for the Resource and allows for customizations, such as setting values
	// not controlled by configuration, conditionally triggering resource
	// recreation, or implementing additional validation logic to abort a plan.

Source: https://pkg.go.dev/github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema

pascal-hofmann avatar May 15 '24 10:05 pascal-hofmann