terraform-provider-argocd icon indicating copy to clipboard operation
terraform-provider-argocd copied to clipboard

Bug Modification of cluster (should recreate the ressource or allow to override the existing one)

Open NGAJean opened this issue 11 months ago • 5 comments

Terraform Version, ArgoCD Provider Version and ArgoCD Version

Terraform version: hashicorp/terraform:1.8
ArgoCD provider version: argoproj-labs/argocd v7.1.0
ArgoCD version: v2.12.3+6b9cd82

Affected Resource(s)

  • argocd_cluster

Debug Output

Error: failed to update cluster https://xxxxx
rpc error: code = PermissionDenied desc = permission denied

Steps to Reproduce

  1. register a first cluster
  2. register a new cluster with the same name
  3. modification failed

Expected Behavior

Automatically trigger a resource recreation or allow via a parameter like upsert in cli to override the existing one.

Actual Behavior

Modification failed

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

NGAJean avatar Dec 02 '24 17:12 NGAJean

Hi @NGAJean would you mind sharing an example TF file to reproduce the issue? I cannot imagine what you mean by "register a new cluster with the same name". Is this a second argocd_cluster resource you are adding or something you do via CLI/UI?

the-technat avatar Dec 06 '24 19:12 the-technat

Hi @the-technat, the usecase is the following. We are using EKS, sometimes when we change some low level parameters we have to recreate the entire cluster. The new cluster have the same name than the previous one but with a different endpoint and CA authority. To update the cluster definition on the argoCD application side with the cli we are able to use the upsert flag to override an existing cluster (the previous one) with the same name even if the spec differs (the new one). What we expect with this argocd terraform provider is that if the endpoint and/or the ca authority for an existing `argocd_cluster terraform ressource change it trigger a recreation of this terraform ressource instead on failed on the apply command. Is it more clear now ?

NGAJean avatar Dec 07 '24 13:12 NGAJean

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Feb 06 '25 12:02 github-actions[bot]

@the-technat have you understood the usecase ?

NGAJean avatar Feb 07 '25 06:02 NGAJean

We are also interested on this as it's useful in case of a cluster has been accidentally deleted to recreate the ArgoCD cluster just replacing the old one (the --upsert option of argocd cluster add)

Would it be feasible to add an extra parameter to support the upsert modifier? I'm willing to contribute with this change

JorTurFer avatar May 14 '25 10:05 JorTurFer

in case of a cluster has been accidentally deleted to recreate the ArgoCD cluster just replacing the old one

@JorTurFer I'm not following you here. If the cluster was accidentally deleted then I would expect the provider to detect that and to offer to create a new one (there is no old one to replace).

onematchfox avatar Jul 21 '25 08:07 onematchfox

Yeah, that's correct, but in that scenario, terraform tries to update the cluster in ArgoCD and it doesn't use the modifier --upsert (like the cli supports), so terraform creates the kubernetes cluster on the vendor, but it fails creating it on ArgoCD. In this case, I have to go to ArgoCD and remove the cluster from it, then terraform can recreate the cluster again in ArgoCD

Basically, I miss the modifier --upsert from argocd-cli -> https://argo-cd.readthedocs.io/en/latest/user-guide/commands/argocd_cluster_add/

--upsert                             Override an existing cluster with the same name even if the spec differs

JorTurFer avatar Jul 21 '25 09:07 JorTurFer

Yeah, that's correct, but in that scenario, terraform tries to update the cluster in ArgoCD

Just so I'm clear... You're saying that if a cluster (that was created via Terraform) is deleted outside of Terraform, then the provider will try to update the non-existent cluster rather than creating a new one?

onematchfox avatar Jul 22 '25 07:07 onematchfox

Just so I'm clear... You're saying that if a cluster (that was created via Terraform) is deleted outside of Terraform, then the provider will try to update the non-existent cluster rather than creating a new one?

(ArgoCD) clusters are "unique" by name, so let's say that you are using argocd cli. F the name is the same but the spec is different, you can add the modifier --upsert to enforce the change of the spec with the same name. Terraform provider doesn't support to change the spec if the cluster already exists as it doesn't support the modifier --upsert that cli does

JorTurFer avatar Jul 22 '25 09:07 JorTurFer

OK. In that case, we're talking about bringing a resource that was created outside of Terraform into the Terraform state so that it can be managed by Terraform?

Upsert isn't a "Terraform thing" - at least I don't know of any providers that provide that sort of functionality, but feel free to point me to any that does. The idiomatic way of doing this in Terraform would be to use one of the import mechanisms.

onematchfox avatar Jul 22 '25 10:07 onematchfox

What we expect with this argocd terraform provider is that if the endpoint and/or the ca authority for an existing `argocd_cluster terraform ressource change it trigger a recreation of this terraform ressource instead on failed on the apply command. Is it more clear now ?

@NGAJean, are you saying that if you update a value on an existing argocd_cluster resource, this does not result in the cluster being updated in ArgoCD? Or just that the error Permission denied occurs when the provider tries to perform the update? I'm a bit surprised if this is the case, as there are tests covering this sort of behaviour (maybe some field is missing though). Can you supply a minimal reproduction/example code as @the-technat asked for originally?

onematchfox avatar Jul 22 '25 10:07 onematchfox

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Sep 20 '25 12:09 github-actions[bot]

What we expect with this argocd terraform provider is that if the endpoint and/or the ca authority for an existing `argocd_cluster terraform ressource change it trigger a recreation of this terraform ressource instead on failed on the apply command. Is it more clear now ?

@NGAJean, are you saying that if you update a value on an existing argocd_cluster resource, this does not result in the cluster being updated in ArgoCD? Or just that the error Permission denied occurs when the provider tries to perform the update? I'm a bit surprised if this is the case, as there are tests covering this sort of behaviour (maybe some field is missing though). Can you supply a minimal reproduction/example code as @the-technat asked for originally?

Hi, Yes this is the problem (existing argocd_cluster ressource isn't updated if spec changed).

Same explanation here, I don't know how to be more clear.

https://github.com/argoproj-labs/terraform-provider-argocd/issues/510#issuecomment-3101954093

NGAJean avatar Sep 21 '25 06:09 NGAJean

So after rereading the issue a couple of times I think I know what it's about. Sorry for the detail @NGAJean.

Let me try to explain with a reproduction example, divided into 4 steps.

Step 1 - Infra

Assuming: you have a host K8s cluster running a central Argo CD instance.

Next: create a child cluster that you'd like to manage and extract a token for authentication:

kind create cluster --name provider-510
kubectl apply -f external_cluster_auth.yaml
kubectl --context kind-provider-510 -n kube-system get secret argocd -o=jsonpath='{.data.token}' | base64 -d | pbcopy

external_cluster_auth.yaml

Step 2 - Connect the kind cluster to your central Argo CD

Let's assume this HCL snippet was used to register the child cluster in our central Argo CD instance:

terraform {
  required_providers {
    argocd = {
      source  = "argoproj-labs/argocd"
      version = "7.11.0"
    }
  }
}

provider "argocd" {
  port_forward = true
  username     = "admin"
  password     = "<output of: kubectl get secret -n argocd argocd-initial-admin-secret -o=jsonpath='{.data.password}' | base64 -d>"
}

## Bearer token Authentication
resource "argocd_cluster" "kubernetes" {
  server = "https://<some-kubeapi-address-rechable-from-central-argocd>:6443"

  config {
    bearer_token = "<my-copied-bearer-token-from-above>"

    tls_client_config {
      insecure = true # too lazy to copy the cert over
    }
  }
}

Do a terraform apply on this. This works perfectly fine and we see the child cluster listed in the Argo CD UI.

Step 3 - reprovision child cluster

Let's assume we weren't happy with the setup of our kind cluster and wanna replace it:

kind delete cluster --name provider-510 # deletes the cluster
kind create cluster --name provider-510 # creates a fresh cluster, same name but maybe a different config
kubectl apply -f external_cluster_auth.yaml # prepare auth method again
kubectl --context kind-provider-510 -n kube-system get secret argocd -o=jsonpath='{.data.token}' | base64 -d | pbcopy

Make sure you have the new token in the clipboard.

Step 4 - adjust HCL config for newly created cluster

Now let's assume the HCL code from above will be updated automatically by some module reference/variable whenever the child cluster is recreated (because it might also be managed using Terraform), so that our HCL snippet now renders like this:

terraform {
  required_providers {
    argocd = {
      source  = "argoproj-labs/argocd"
      version = "7.11.0"
    }
  }
}

provider "argocd" {
  port_forward = true
  username     = "admin"
  password     = "<output of: kubectl get secret -n argocd argocd-initial-admin-secret -o=jsonpath='{.data.password}' | base64 -d>"
}

## Bearer token Authentication
resource "argocd_cluster" "kubernetes" {
  server = "https://<some-new-changed-kubeapi-address->:6443"

  config {
    bearer_token = "<my-new-changed-bearer-token>"

    tls_client_config {
      insecure = true # too lazy to copy the cert over
    }
  }
}

What would we expect: The existing cluster object in Argo CD is given a new kubeapi address and bearer token

What do we get:

Error: failed to update cluster https://xxxxx
rpc error: code = PermissionDenied desc = permission denied

Why this won't work - Terraform's view

Paying attention to the output of the Terraform run in step 4 we can see that:

  • Terraform uses the kubeapi URL as it's ID in the terraform state for the cluster resource
  • Terraform knows that an argocd_cluster resource can be updated in-place
  • Terraform attemps to patch what it thinks is an existing resource in the Argo CD API by doing it's PUT request to the Argo CD API
  • it fails with a permission denied error coming from the Argo CD API

What we don't know:

  • if Terraform would correctly replace the current resource in it's state with a new ID
  • if there would be any other bugs

Why this won't work - Argo CD's view

Argo CD (based on the API docs and some debugging-runs) will see the following:

  • Argo CD API uses the kubeapi url as the unique identifier of a cluster (according to it's swagger-ui)
  • it so sees a PUT request to a cluster-object with an URL that doesn't exist in our central Argo CD instance
  • it responds with "permission denied" to not disclose information about whether this cluster exists or not (see https://github.com/argoproj/argo-cd/issues/19851 for more background)

What to conclude from this?

Option 1 - server field is force-new not in-place update

We could argue that since the KubeAPI URL is the unique identifier of a cluster, the provider should not treat an update to this field as "in-place modification" but instead as "force-recreation" and subsequentially first delete the old object and then create a new one.

=> this is a fairly simple terraform schema change

It's be in favor of this method, as it represents Terraform's way of expressing that something is unique and an update to it requires a recreate. I'm pretty sure this works.

@NGAJean would the recreation of this terraform resource trigger other terraform resources in your workspaces to act in an unpredictable way? Like is there some resource depending on the argocd_cluster resource that would potentially be recreated too if the argocd_cluster resource is recreated?

Option 2 - introduce the upsert flag of the Argo CD cli

I haven't studied in detail what the upsert flag in Argo CD does, but I assume the upsert flag triggers some logic that deletes the old cluster and recreates it using the new url. Given that the CLI uses the same Argo CD API the provider uses it should have to deal with the same behavior. it would be interesting to see if the CLI returns a permission denied as well when omitting the upsert flag for the same kind of operation.

The difference of this option to the other one is to make this force-new thing optional and let users decide whether they want that or not. Maybe a test if the CLI returns a permission denied as well, would be interesting to see.

While it's always good to have options, I imagine that for complex aggregated Terraform pipelines no one would want to set an upsert=true on a Terraform resource just because some other resource changes. This should happen automatically.

In addition it's kinda odd if a user can change the behavior of the provider using a flag. I think that has a potential for confusing users too. Given that the documentation of the provider doesn't mention the argocd_cluster resource being recreated on changes.

Takeway actions

Let me update the issue title to better reflect the issue & open a PR for option 1. If there are any things I've missed let me know.

the-technat avatar Oct 08 '25 09:10 the-technat

This is quite nice definition for the current issue, yeah. Great! ❤️

JorTurFer avatar Oct 08 '25 09:10 JorTurFer

@NGAJean would the recreation of this terraform resource trigger other terraform resources in your workspaces to act in an unpredictable way? Like is there some resource depending on the argocd_cluster resource that would potentially be recreated too if the argocd_cluster resource is recreated?

This is exactly the issue that we are facing.

On our workspaces argocd_cluster resources recreation isn't a problem.

NGAJean avatar Oct 08 '25 14:10 NGAJean

The related PR has been merged and my local tests were successful. Github is currently building a patch release v7.11.2 that includes the fix. Let me know if that works for you.

the-technat avatar Oct 08 '25 15:10 the-technat