camunda-platform-helm icon indicating copy to clipboard operation
camunda-platform-helm copied to clipboard

[ENHANCEMENT] add testing to cover unresolved references (prevent CreateContainerConfigError)

Open jessesimpson36 opened this issue 9 months ago • 0 comments

Describe the use case:

There are often times where deploying a kubernetes manifest can lead to a pod with CreateContainerConfigError. One such example is a recent issue I tried to fix, and caused in a different way when the following env var in identity:

        - name: IDENTITY_DATABASE_PASSWORD
          valueFrom:
            secretKeyRef:
              key: password
              name: cpt-identity-postgresql

Does not exist in cases where the subchart identityPostgresql is disabled.

The traditional mechanism of testing this would be to create a unit test and check to see if the secret cpt-identity-postgresql exists, and that the key password exists within that secret. This has some limitations:

  1. Whatever unit test we write is only limited to that one specific reference.
  2. Unit tests are error prone, and sometimes people don't understand why a test is written, and removes it (I did this).

A bad alternative would be an integration test specifically for scenarios with externalDatabases enabled. It's bad because it's expensive on time and CI resources.

Here's what I propose:

I want a catch-all smoke-test that will produce a CI failure for ANY CreateContainerConfigError, that is cheap enough in time and resources to be able to account for different values.yaml setups.

We can have a suite of values.yaml's for different scenarios, such as "multitenancy enabled", "external database", or anything that would produce these references.

For each of these values.yaml's, we can deploy the helm chart on our CI cluster, but we can override all of the command, entrypoint, and image.repository to be echo success and the image name being alpine:latest. This way, we aren't using much system resources, the test would be pretty quick because no code is actually running... it's just checking to make sure the volumes, envFrom, valueFrom references all resolve to something.

If the pod succeeds, the pod listing will mark the state as Completed. However, if the pod fails to resolve valueFrom, then the status will be CreateContainerConfigError, and the CI will fail because the helm install --wait should fail.

Related to: SUPPORT-21601 SUPPORT-21974 https://github.com/camunda/camunda-platform-helm/issues/1652

jessesimpson36 avatar May 20 '24 18:05 jessesimpson36