elemental icon indicating copy to clipboard operation
elemental copied to clipboard

Automated full stack test

Open agracey opened this issue 2 years ago • 7 comments

If possible, it would be nice to automate a test of the full stack including creating a cluster and installing a few common tools (such as Longhorn and NeuVector but potentially others).

I imagine the test would look something like:

  1. Install operator and configure
  2. Build bootstrap iso
  3. Boot machines with iso and register then reboot
  4. Install k3s using rancher provisioning
  5. install Longhorn and NeuVector
  6. Smoke test Longhorn with a workload that can save and recall files from a volume
  7. Smoke test NV by adding rule that will quarantine workload
  8. Upgrade
  9. Repeat smoke tests
  10. Teardown and report

agracey avatar Aug 18 '22 15:08 agracey

4. Install k3s using rancher provisioning

K3s is Tech Preview in Rancher, better to use RKE2 (E2E tests have been modified for this).

5. install Longhorn and NeuVector

Not sure that we currently have enough runner to use longhorn on it. And is it really a useful usecase currently?

ldevulder avatar Aug 26 '22 14:08 ldevulder

  1. Install k3s using rancher provisioning

K3s is Tech Preview in Rancher, better to use RKE2 (E2E tests have been modified for this).

My thinking for K3s is that it is what we expect our edge users to use primarily. It may be marked as tech preview but we intend to support both for the edge.

  1. install Longhorn and NeuVector

Not sure that we currently have enough runner to use longhorn on it. And is it really a useful usecase currently?

Yes, we need to provide a storage layer that allows for workload moving between nodes (localpath provisioner adds node constraints due to it being storage on that specific node)

agracey avatar Aug 26 '22 20:08 agracey

why do we need this? Once we have installed rke2/k3s and those deployments are up, then the usage of those are out of elemental scope, aren't them?

if they are installed using the official deployments from rancher, elemental ahs nothing to do in there unless there is some paths or something that are needed, they should work as a manual installed cluster?

Itxaka avatar Sep 02 '22 13:09 Itxaka

They rely on host level capabilities. For example, Longhorn needs the iscsi packages.

My rationale for the tests is to make sure that we can be confident that something trivial like that doesn't get missed causing a bad upgrade gets distributed to deployments that are hard to repair if something goes wrong. The automated rollback should kick in for anything that's hard to recover from so it's just a second line of protection.

It's not high priority, but I think this type of test is worth while even if it's just a sanity check that we expect to never fail.

agracey avatar Sep 02 '22 14:09 agracey

Interesting. Do we have a list of products that would we support 100%, i.e. the rancher products (longhorn, harvester, etc..)?

We could start by that, having a list of those so we can assure 100% that those rancher products that are usually deployed and we offer support for them are properly supported.

Itxaka avatar Sep 02 '22 15:09 Itxaka

I think Neuvector and Longhorn would the be ones that I'm most concerned with. Harvester is a bit of an outlier there as it's currently built into its own appliance stack and I expect that they would do the e2e testing.

agracey avatar Sep 02 '22 15:09 agracey

It likely doesn't need to be a test on every push to main but should be something that runs before a new image is published.

agracey avatar Sep 02 '22 15:09 agracey