testyomesh
testyomesh copied to clipboard
Continually test your Service Mesh
testyomesh

Enough with the memes, what is this?
I've been upgrading Istio pretty much since it was born. I've encountered my fair share of bugs during the upgrade process, and the vast majority are due to nuances in my configuration or environment, rather than obvious bugs in core components.
Therefore, I accept that when testing Istio releases in my environment, I'm going to find issues. This project is about surfacing those issues as soon as possible during release testing in a qualifying environment. However, I actually run it continually on all my Istio clusters as active monitornig.
How?
Like this:

This is what each of the components do:
operator
Think of operator as a chaos monkey. It runs tasks in lib/apps/tasks to intentionally create some chaos that should not cause any errors! Currently implemented tasks are:
auth-policy- creates and deletes asecurity.istio.io/v1beta1/AuthorizationPolicyevery 5-10 minutes, which triggers an inbound listener reload. You need to be on1.6+for this to work (logs errors otherwise).restart-simple-service- every 5-10 mins will pick a randomsimple-serviceand rolling restart it
load-tester
Load tester looks at your configuration and attempts to think of all the different ways it can poke your services, by:
- Calculating each different permutation of request, eg
service1 -> service2 -> service3, or perhapsservice3 -> service1 - Adding in all the different HTTP methods to that mix, eg
service1 -- GET --> service2, or perhapsservice2 -- PATCH --> service1 - Using a mixture of instant and delayed routes, eg
service1 -- GET /instant --> service3, or perhapsservice2 -- POST /delayed --> service3 - Requesting different status codes too, eg:
service1 -- POST /instant?code=204 --> service2
It might seem excessive, but the cardinality of the requests will help you find subtle issues such as this issue with content-encoding on 204 responses.
simple-service
simple-service is what it says, a simple http web server which has a few endpoints for poking:
/instant?code={code}- to return an instant response with a given status code/delayed?code={code}- asinstant, but with a random delay (or fixed, if you passdeplay={delay})/downstream?servers=another-app/instant- allows you to tell the server to make a subsequent downstream request to another app, and return an aggregate response
How do I know somethings broken?
Well, I'm presuming you're already monitoring, graphing and alerting on the istio request metrics such as istio_requests_total{response_code=~"5.*"}. So that's on you.

How do I install it
There's a helm chart in the ./helmfile/charts/testyomesh folder, or you can get helmfile, and simply type helmfile sync from the helmfile/ folder.
I'll get around to versioned releases when I have time, until then you probably want to store the latest image in your registry, so:
docker pull stono/testyomesh:latestdocker test stono/testyomesh:latest you-registry:whatever
And then update helmfile/charts/testyomesh/values.yaml accordingly.
How about config?
The things you can currently configure are in ./helmfile/charts/testyomesh/values.yaml. The out of the box configuration will give you 2 load test replicas running 30 threads, 3 simple services, and 1 operator. This will give you around 150 ops/second.