nebari icon indicating copy to clipboard operation
nebari copied to clipboard

Add nebari upgrade tests

Open Adam-D-Lewis opened this issue 1 year ago • 2 comments

Context

Currently we have no automated deployment tests around running Nebari upgrade. Such a test would have prevented the data loss described in the context of https://github.com/nebari-dev/nebari/issues/2700. This is also beneficial more broadly to help catch any error which occurs during the Nebari upgrade process, but not during an initial deployment. These tests should be added.

I propose the tests should upgrade from the last release of Nebari to the latest develop branch. You then check if files, conda envs, and users persisted across the upgrade.

We already have https://github.com/nebari-dev/nebari/blob/develop/.github/workflows/test_local_integration.yaml, https://github.com/nebari-dev/nebari/blob/develop/.github/workflows/test_aws_integration.yaml, https://github.com/nebari-dev/nebari/blob/develop/.github/workflows/test_azure_integration.yaml, https://github.com/nebari-dev/nebari/blob/develop/.github/workflows/test_gcp_integration.yaml as well as the deployment tests. These could be used as references when adding a nebari upgrade test.

Adam-D-Lewis avatar Sep 03 '24 17:09 Adam-D-Lewis

@Adam-D-Lewis has provided a good direction, but to sumirize, basically: We should include a new action file, copying mostly what's already in place under test_local_integration and replacing the source installation from Nebari with the latest release instead of the current branch version.

The main goal for this new test would be:

  • Deploy the latest version of Nebari, upgrade the yaml file (nebari upgrade), attest for changes in the yaml, re-deploy, and attest service liveness.

Some parts, like the "attest" final step, might seem ambiguous right now, so this can be discussed further.

viniciusdc avatar Oct 10 '24 10:10 viniciusdc

From an offline convo I had with @dcmcand:

  • We want the workflow to be triggered by pre-releases or optionally manually
  • We want to use local provider

The steps should be

  1. Deploy the most recent released version of nebari
  2. Perform a health check (curl some endpoints for now)
  3. Run nebari upgrade for the version that triggered the workflow
  4. Perform the health check again

Depending on the scope of the health checks, which we want to keep minimal for the initial version of this test, this might or might not detect whether the upgrade does not break anything.

In any case, it will not be able to prevent the data loss described in the top comment that happened because a resource was deleted and recreated. To test that we would need to add some state in between step 1. and 2. above and verify its integrity after step 4.

pmeier avatar Oct 17 '24 12:10 pmeier