terraform-provider-flux icon indicating copy to clipboard operation
terraform-provider-flux copied to clipboard

Future Proofing Provider

Open phillebaba opened this issue 2 years ago • 1 comments

A lot has happened since the initial development of this provider, and the amount of users seems to have just increased. As it is a critical component for those who bootstrap Flux with Terraform I thought that it would be a good idea to discuss the most common issues that users are facing today and consider alternative solutions, if there are any. The current main issues that I see end users facing include the following.

  • The general configuration is complicated. A lot of people expect the provider to do the actual installation of the provider instead of just exposing the manifests. This means that the easy setup is a lot more difficult than I would like, and anything a bit more complicated can become very complicated quickly.
  • There has and still exists issues related to API Version upgrading which could cause existing deployments to break.
  • Customization of the bootstrapping is possible today but is not totally logical, so it requires us to document each use case. Additionally working with YAML in Terraform is not the greatest experience, adding more challenges to end users.

So I have spent the last two months during some free time attempted to implement some alternative solutions and wanted to share my thoughts before creating a PR. As I see it we have basically two options, either trying to improve the experience of the current solution and play into the strengths of public providers like kubectl, github, azure-devops, etc. The other option is to go the way of implementing all the logic inside of the provider, attempting to share code with the CLI installation.

New Bootstrap Resource

An option that some people may like is for the provider to take full responsibility to the provide, just like the CLI does. It would make configuration really easy for new users as the Terraform required could, in theory at least, be very simple.

provider "flux" {
  kuberentes {
    ...
  }
}

resource "flux_install" "this" {
  target_path = var.target_path
  url         =  var.clone_url
}

This resource would have to be responsible for adding manifests to the cluster and committing the files to the correct repositories. This means that the resource needs to implement both a git client and a Kubernetes client. It would have to track both state diffs in the cluster and the repository. In theory this is possible to build, but it includes challenges that the CLI does not have to solve.

The major challenge that I have found while attempting to just implement the Kubernetes component of this resource is the tracking of state drift between what is in the Kuberentes cluster and what is expected by the resource. Implementing this logic when there is a 1:1 relationship between Kubenerets resources and Terraform resources is logical as there are methods to tell Terraform that a resource needs to be created again. It is however a lot more complicated to do this for a Terraform resource that manages multiple Kubernetes resources. This is probably the reason why the kubectl Terraform provider does not implement support for multidoc YAML. Dealing with this type of state drift for git repositories also would make the problem even more challenging. I don't really see that an alternative could be increasing the amount of Flux Terraform resources used to configure the bootstrap as that would just replicate the issues that we are currently facing.

If we are going to go with this option we will first need to prove that it is possible to reliably get Terraform to detect resource state drift and re apply the required changes with multiple resources. This includes if one of many resources have been removed and needs to be re applied to the cluster.

Terraform Kubernets Provider

When this provider was initially developed the ability to create custom resources with the official Terraform Kuberentes provider was in early alpha stages. By now that option as been out for a while in the form of the kubernetes_manifest resource. One option could be to automatically convert YAML to HCL and publish a module which would do the bootstrapping. The benefit is that it would rely on an official provider and would make it easier for end users to customize their deployments. One option would be for a module to only publish the CRDs and Deployments while end users would have to be responsible for the bootstrapping of GitRepositories and Kustomizations.

There are however a couple downsides with this solution.

  • This resource relies on server side apply which means that the cluster creation and Flux bootstrapping has to be done in separate Terraform states.
  • It would require some custom solution to convert HCL back to YAML so that it can be committed to the bootstrap repository.

Continue With Datasource

Continuing with the current datasource configuration may be the simplest solution but will require us to solve all of the major issues that we are currently facing. The first of them being breaking existing deployments. One first step for this is to update all of the examples to set apply_only to true. This will cause the resources to not be removed when they are deleted from the state. This new option will resolve the immediate problems which may catching end users as they upgrade today and buy some time if some other solution is chosen. I dont see a problem with Terraform not removing Flux releated resources as the clean up process is still very shaky and for the most part will fail when deleting a cluster with Terraform due to the fact that the order resources are deleted matters. So it wont really remove features that users have today.

On top of this other issues with they key including the api version being included has to be resolved if this should be considered a long term solution. Either way we should consider adding the apply_only option now.

phillebaba avatar May 03 '22 21:05 phillebaba

After a couple of discussions and some more research it seems like the best option would be to move forward with a new resource which manages the full installation process. It is now possible to diff multi doc manifests thanks to the new Terraform Framework and AttributeModifiers. I am going to start working on adding new resources which manage the full life cycle now. Hopefully it will simplify the installation process with Terraform.

There are still a couple of unknowns, specifically regarding the git component of the installation process. The lack of multi ack support in the git library which still needs to be figured out. The initial implementation may be limited to those git repositories which are compliant with the Flux CLI installation method.

phillebaba avatar Aug 03 '22 21:08 phillebaba