elemental Automated full stack test

Automated full stack test

Open agracey opened this issue 2 years ago • 7 comments

If possible, it would be nice to automate a test of the full stack including creating a cluster and installing a few common tools (such as Longhorn and NeuVector but potentially others).

I imagine the test would look something like:

Install operator and configure
Build bootstrap iso
Boot machines with iso and register then reboot
Install k3s using rancher provisioning
install Longhorn and NeuVector
Smoke test Longhorn with a workload that can save and recall files from a volume
Smoke test NV by adding rule that will quarantine workload
Upgrade
Repeat smoke tests
Teardown and report

Aug 18 '22 15:08 agracey

4. Install k3s using rancher provisioning

K3s is Tech Preview in Rancher, better to use RKE2 (E2E tests have been modified for this).

5. install Longhorn and NeuVector

Not sure that we currently have enough runner to use longhorn on it. And is it really a useful usecase currently?

Aug 26 '22 14:08 ldevulder

Install k3s using rancher provisioning

K3s is Tech Preview in Rancher, better to use RKE2 (E2E tests have been modified for this).

My thinking for K3s is that it is what we expect our edge users to use primarily. It may be marked as tech preview but we intend to support both for the edge.

install Longhorn and NeuVector

Not sure that we currently have enough runner to use longhorn on it. And is it really a useful usecase currently?

Yes, we need to provide a storage layer that allows for workload moving between nodes (localpath provisioner adds node constraints due to it being storage on that specific node)

Aug 26 '22 20:08 agracey

why do we need this? Once we have installed rke2/k3s and those deployments are up, then the usage of those are out of elemental scope, aren't them?

if they are installed using the official deployments from rancher, elemental ahs nothing to do in there unless there is some paths or something that are needed, they should work as a manual installed cluster?

Sep 02 '22 13:09 Itxaka

They rely on host level capabilities. For example, Longhorn needs the iscsi packages.

My rationale for the tests is to make sure that we can be confident that something trivial like that doesn't get missed causing a bad upgrade gets distributed to deployments that are hard to repair if something goes wrong. The automated rollback should kick in for anything that's hard to recover from so it's just a second line of protection.

It's not high priority, but I think this type of test is worth while even if it's just a sanity check that we expect to never fail.

Sep 02 '22 14:09 agracey

Interesting. Do we have a list of products that would we support 100%, i.e. the rancher products (longhorn, harvester, etc..)?

We could start by that, having a list of those so we can assure 100% that those rancher products that are usually deployed and we offer support for them are properly supported.

Sep 02 '22 15:09 Itxaka

I think Neuvector and Longhorn would the be ones that I'm most concerned with. Harvester is a bit of an outlier there as it's currently built into its own appliance stack and I expect that they would do the e2e testing.

Sep 02 '22 15:09 agracey

It likely doesn't need to be a test on every push to main but should be something that runs before a new image is published.

Sep 02 '22 15:09 agracey

elemental elemental copied to clipboard

Automated full stack test

elemental
elemental copied to clipboard