sieve icon indicating copy to clipboard operation
sieve copied to clipboard

Safety check throughout testing procedure

Open kosyd opened this issue 3 years ago • 2 comments

⭐ Following up from NA KubeCon 2021 ⭐ Per my discussion with Lalith Suresh and Xudong Sun

Safety checking throughout the testing process would be invaluable for our controller, which enforces a set order of dependencies between the pods of a group of Deployments.

For example, we have 3 Deployments First, Second, and Third -- where the pods of Second rely on the pods of First being available and the pods of Third rely on the pods of both Second and First.

For simplification it's easiest to imagine that all three deployments have the same number of replicas e.g. 5; however, in reality we calculate this based on a ratio between the deployments*.

Consider the case where each Deployment is expected to have 5 replicas at the end of the roll and the dependency structure is as described above:

Time First Second Third
0 0 0 0
1 1 0 0
2 2 1 0
3 3 1 1
4 4 2 2
5 4 3 3
6 5 3 3
7 5 4 3
8 5 5 4
9 5 5 5

We would like to be able to check that we are not violating this dependency tree while Deployments are becoming available.

  • For example First can have 7 replicas and Second can have 5 and Third can have 13. The ratio between First and Second would be for every pod of First we can have (1/7 * 5 = ~0.7 ) pods of Second, and similarly for the ratio between Second and Third and First and Third etc...

kosyd avatar Oct 21 '21 15:10 kosyd

@kosyd Thank you so much for describing the example! And yes checking safety property is a very important feature that we are implementing.

To check safety properties, Sieve automatically instruments the apisever and records the execution history of the whole k8s cluster by intercepting various (Creation/Deletion/Update) events for all kinds of resources (including both custom resources and k8s core resources). In that way, Sieve can capture every single change to each resource and Sieve can check the resource state for any object at any point by scanning the recorded history.

For now, this feature is only for internal use and we are actively developing the APIs (and documentation) for external users to customize their oracles in a declarative way and check for various safety properties. In the particular example you described, the user can write a declarative oracle asserting that "the pod ratio between two specific deployments should always be no larger than X%". After that Sieve will traverse the history and check the property at every point in the history where the deployment has changed. If the ratio is incorrect at any point, Sieve will capture it.

We will definitely let you know when the feature is ready for use and we will also prepare some examples on how to customize the declarative oracles.

marshtompsxd avatar Oct 22 '21 21:10 marshtompsxd

Hello @kosyd Sorry for the late response. Sieve currently implements customized safety checker that allows users to specify any safety property that checks any resources throughout the testing procedure. You can find examples here: https://github.com/sieve-project/sieve/blob/main/sieve_oracle/customized_safety_checker.py

To implement the safety checker, all a user needs to do is to specify the checker function and the resource to check.

Is it possible for you to point us to some concrete controller example that you want to check the safety property? So that we can know whether this is the exact feature you need. It would also be great if you could try Sieve to test some of your controllers and let us know any feedback.

marshtompsxd avatar Jul 15 '22 05:07 marshtompsxd