acto
acto copied to clipboard
Starter Project #4 Differentiate between misconfiguration and bugs
What we met
We found that some test cases generated by Acto may contain misconfiguration. Here is an example of a mutation from state 0 to state 1. In the following example (See CRD Definition), Acto add an override of livenessProbe to the custom resource, which is invalid because rabbitmq will not use the port 8500. Therefore, Kubernetes will constantly kill the pod because the pod cannot pass the liveness check.
There are also many similar cases in the alarm report, such as an invalid image name and a missing field. The issue is intended to solve this problem, or at least mitigate the problem.
What we could do
- Improve the test cases generated by Acto.
- Collect events and logs from kubernetes, and classify the alarms.
Improve the test cases generated by Acto
TBD
Collect events (and logs) from kubernetes, and classify the alarms.
The event indicates that the pod has a invalid config and could not be created, which is different from a crash event. We think such kind of event may indicate a misconfiguration.
Warning FailedCreate 50s (x19 over 5m40s) statefulset-controller create Pod test-cluster-server-2 in StatefulSet test-cluster-server failed error: Pod "test-cluster-server-2" is invalid: spec.containers[0].image: Required value
CRD Definition
Mutation:
$ diff mutated-0.yaml mutated-1.yaml
> override:
> statefulSet:
> spec:
> template:
> spec:
> containers:
> - livenessProbe:
> httpGet:
> port: 8500
> initialDelaySeconds: 10
> name: b
Use the following custom resource to demonstrate. State 0:
apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
name: test-cluster
namespace: rabbitmq-system
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution: null
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- test-cluster
topologyKey: kubernetes.io/hostname
image: null
imagePullSecrets: null
persistence:
storage: 50Gi
rabbitmq:
additionalConfig: 'cluster_partition_handling = pause_minority
vm_memory_high_watermark_paging_ratio = 0.99
disk_free_limit.relative = 1.0
collect_statistics_interval = 10000
'
replicas: 3
resources:
limits:
cpu: 1
memory: 4Gi
requests:
cpu: 1
memory: 4Gi
secretBackend: null
service:
type: ClusterIP
skipPostDeploySteps: false
terminationGracePeriodSeconds: 1024
tls:
caSecretName: null
disableNonTLSListeners: false
secretName: null
tolerations: null
State 1:
apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
name: test-cluster
namespace: rabbitmq-system
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution: null
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- test-cluster
topologyKey: kubernetes.io/hostname
image: null
imagePullSecrets: null
override:
statefulSet:
spec:
template:
spec:
containers:
- livenessProbe:
httpGet:
port: 8500
initialDelaySeconds: 10
name: b
persistence:
storage: 50Gi
rabbitmq:
additionalConfig: 'cluster_partition_handling = pause_minority
vm_memory_high_watermark_paging_ratio = 0.99
disk_free_limit.relative = 1.0
collect_statistics_interval = 10000
'
replicas: 3
resources:
limits:
cpu: 1
memory: 4Gi
requests:
cpu: 1
memory: 4Gi
secretBackend: null
service:
type: ClusterIP
skipPostDeploySteps: false
terminationGracePeriodSeconds: 1024
tls:
caSecretName: null
disableNonTLSListeners: false
secretName: null
tolerations: null