glooshot streamline the release e2e

streamline the release e2e

Open mitchdraft opened this issue 5 years ago • 0 comments

e2e tests during release have been flakey

The tutorial e2e has been failing where it expects the experiment to be in state failed, but the experiment is in started state

I think it is a combination of two things:

incomplete cluster refresh after completed tests
- this seems to leave the prometheus server in an unstable state. It crashes a few times, which can prevent glooshot from registering the failure conditions
insufficient timeouts - we don't give enough time on the first image pull so retries are needed. The first attempt primes the container repo, the second attempt passes the test

The following workaround allowed `v0.0.5` to pass

clear the resources manually

kubectl delete ns bookinfo
kubectl delete ns glooshot
        # namespace (do in background to ignore not-exist error)
        kubectl delete ns istio-system &
        # cluster-scoped resources
        for i in `kubectl get customresourcedefinitions -o=jsonpath="{.items[*].metadata.name}"`; do echo $i |grep istio|xargs kubectl delete customresourcedefinition ; done
        for i in `kubectl get clusterrole -o=jsonpath="{.items[*].metadata.name}"`; do echo $i |grep istio|xargs kubectl delete clusterrole ; done
        for i in `kubectl get clusterrolebinding -o=jsonpath="{.items[*].metadata.name}"`; do echo $i |grep istio|xargs kubectl delete clusterrolebinding ; done
        # do in background to ignore not-exist error
        kubectl delete mutatingwebhookconfiguration istio-sidecar-injector &
        # namespace-scoped resources in namespaces other than istio-system
        for n in `kubectl get ns -o=jsonpath="{.items[*].metadata.name}"`; do
            echo $n;
            # delete each secret made by istio
            for i in `kubectl get secrets -n=$n -o=jsonpath="{.items[*].metadata.name}"`; do echo $i |grep istio|xargs kubectl delete secret -n=$n; done
        done

run the release once (expect failure, prime cache)
run the release again (expect success)

TODO

[ ] reset cluster after test
- delete glooshot, bookinfo, istio, and supergloo resources
[ ] change test to wait for all bookinfo pods to be ready

Jun 10 '19 14:06 mitchdraft

glooshot glooshot copied to clipboard

streamline the release e2e

e2e tests during release have been flakey

The following workaround allowed v0.0.5 to pass

TODO

glooshot
glooshot copied to clipboard

The following workaround allowed `v0.0.5` to pass