glooshot icon indicating copy to clipboard operation
glooshot copied to clipboard

streamline the release e2e

Open mitchdraft opened this issue 5 years ago • 0 comments

e2e tests during release have been flakey

  • The tutorial e2e has been failing where it expects the experiment to be in state failed, but the experiment is in started state

I think it is a combination of two things:

  • incomplete cluster refresh after completed tests
    • this seems to leave the prometheus server in an unstable state. It crashes a few times, which can prevent glooshot from registering the failure conditions
  • insufficient timeouts - we don't give enough time on the first image pull so retries are needed. The first attempt primes the container repo, the second attempt passes the test

The following workaround allowed v0.0.5 to pass

  • clear the resources manually
kubectl delete ns bookinfo
kubectl delete ns glooshot
        # namespace (do in background to ignore not-exist error)
        kubectl delete ns istio-system &
        # cluster-scoped resources
        for i in `kubectl get customresourcedefinitions -o=jsonpath="{.items[*].metadata.name}"`; do echo $i |grep istio|xargs kubectl delete customresourcedefinition ; done
        for i in `kubectl get clusterrole -o=jsonpath="{.items[*].metadata.name}"`; do echo $i |grep istio|xargs kubectl delete clusterrole ; done
        for i in `kubectl get clusterrolebinding -o=jsonpath="{.items[*].metadata.name}"`; do echo $i |grep istio|xargs kubectl delete clusterrolebinding ; done
        # do in background to ignore not-exist error
        kubectl delete mutatingwebhookconfiguration istio-sidecar-injector &
        # namespace-scoped resources in namespaces other than istio-system
        for n in `kubectl get ns -o=jsonpath="{.items[*].metadata.name}"`; do
            echo $n;
            # delete each secret made by istio
            for i in `kubectl get secrets -n=$n -o=jsonpath="{.items[*].metadata.name}"`; do echo $i |grep istio|xargs kubectl delete secret -n=$n; done
        done
  • run the release once (expect failure, prime cache)
  • run the release again (expect success)

TODO

  • [ ] reset cluster after test
    • delete glooshot, bookinfo, istio, and supergloo resources
  • [ ] change test to wait for all bookinfo pods to be ready

mitchdraft avatar Jun 10 '19 14:06 mitchdraft