nebari icon indicating copy to clipboard operation
nebari copied to clipboard

Investigate whether using Helm directly (instead of using it via Terraform) would be beneficial

Open marcelovilla opened this issue 9 months ago • 0 comments

Context

Currently, Terraform is being used not only to manage the required infrastructure for Nebari to run, but also manage different services (e.g., argo, grafana, jupyterhub, loki, etc...) via Helm. There are other services that are also managed with Terraform but not using the Helm provider (e.g., conda-store and dask-gateway).

While there has been a lot of work put into this approach, I believe having a complex multi-stage Terraform configuration might have the following downsides:

  • Dependencies between stages require passing variables to each other, adding complex logic behind the deployment/destruction process.
  • Mixing infrastructure with services configuration in Terraform does not allow for an easy decoupling of both components, hindering potential approaches where they should be running separately (e.g., redeploying services in an existing cluster to speed CI feedback times).
  • Keeping service configuration within Terraform forces resource destruction that adds overhead and might not be necessary when tearing down a Nebari cluster—after all the service configuration is not relevant anymore once the actual infrastructure gets destroyed.
  • As we move forward and consider incorporating specific use-case extensions (or spins), the current Terraform configuration will only grow more complex.

Value and/or benefit

Using Helm directly to deploy and configure services and having Terraform manage strictly the actual infrastructure for a Nebari cluster can simplify our current deployment/destruction process. At this point this is just a hypothesis, but I believe this would allow for a better developer experience and have a more sustainable code base in the long term.

Anything else?

There are a couple of tasks that can help us get a more informed opinion about this approach:

  • [ ] Review what services rely on Helm
  • [ ] Create a POC migrating one of those services to use Helm directly
  • [ ] Discuss or outline what the approach for other services that don't currently use Helm might look like (if different from just keeping the current Terraform configuration).

marcelovilla avatar May 02 '24 19:05 marcelovilla