sieve
sieve copied to clipboard
Safety check throughout testing procedure
⭐ Following up from NA KubeCon 2021 ⭐ Per my discussion with Lalith Suresh and Xudong Sun
Safety checking throughout the testing process would be invaluable for our controller, which enforces a set order of dependencies between the pods of a group of Deployments.
For example, we have 3 Deployments First
, Second
, and Third
-- where the pods of Second
rely on the pods of First
being available and the pods of Third
rely on the pods of both Second
and First
.
For simplification it's easiest to imagine that all three deployments have the same number of replicas e.g. 5; however, in reality we calculate this based on a ratio between the deployments*.
Consider the case where each Deployment is expected to have 5 replicas at the end of the roll and the dependency structure is as described above:
Time | First | Second | Third |
---|---|---|---|
0 | 0 | 0 | 0 |
1 | 1 | 0 | 0 |
2 | 2 | 1 | 0 |
3 | 3 | 1 | 1 |
4 | 4 | 2 | 2 |
5 | 4 | 3 | 3 |
6 | 5 | 3 | 3 |
7 | 5 | 4 | 3 |
8 | 5 | 5 | 4 |
9 | 5 | 5 | 5 |
We would like to be able to check that we are not violating this dependency tree while Deployments are becoming available.
- For example
First
can have 7 replicas andSecond
can have 5 andThird
can have 13. The ratio betweenFirst
andSecond
would be for every pod ofFirst
we can have (1/7 * 5 = ~0.7 ) pods ofSecond
, and similarly for the ratio betweenSecond
andThird
andFirst
andThird
etc...
@kosyd Thank you so much for describing the example! And yes checking safety property is a very important feature that we are implementing.
To check safety properties, Sieve automatically instruments the apisever and records the execution history of the whole k8s cluster by intercepting various (Creation/Deletion/Update) events for all kinds of resources (including both custom resources and k8s core resources). In that way, Sieve can capture every single change to each resource and Sieve can check the resource state for any object at any point by scanning the recorded history.
For now, this feature is only for internal use and we are actively developing the APIs (and documentation) for external users to customize their oracles in a declarative way and check for various safety properties. In the particular example you described, the user can write a declarative oracle asserting that "the pod ratio between two specific deployments should always be no larger than X%". After that Sieve will traverse the history and check the property at every point in the history where the deployment has changed. If the ratio is incorrect at any point, Sieve will capture it.
We will definitely let you know when the feature is ready for use and we will also prepare some examples on how to customize the declarative oracles.
Hello @kosyd Sorry for the late response. Sieve currently implements customized safety checker that allows users to specify any safety property that checks any resources throughout the testing procedure. You can find examples here: https://github.com/sieve-project/sieve/blob/main/sieve_oracle/customized_safety_checker.py
To implement the safety checker, all a user needs to do is to specify the checker function and the resource to check.
Is it possible for you to point us to some concrete controller example that you want to check the safety property? So that we can know whether this is the exact feature you need. It would also be great if you could try Sieve to test some of your controllers and let us know any feedback.