E2E tests are flaking
Bug description:
I noticed some flakes where random things failing. I am not sure if these are all related and after looking at logs and commits in the last few days I don't see any thing jumping out as the issue.
- 9/16/2022 - https://github.com/openservicemesh/osm/actions/runs/3069568739/jobs/4958431162 (kafka doesn't reach concensus)
Using config: /opt/bitnami/zookeeper/bin/../conf/zoo.cfg
> (kafka-zookeeper-0) ZK status check failed: expected nil err, got command terminated with exit code 1
> (kafka-zookeeper-1) ZK status check succeeded!
> (kafka-zookeeper-2) ZK status check succeeded!
> (kafka-zookeeper-0) Stdout /opt/bitnami/java/bin/java
Client port found: 2181. Client address: localhost. Client SSL: false.
- 9/16/2022 - https://github.com/openservicemesh/osm/actions/runs/3069634707/jobs/4958564544#step:5:399 (nginx doesn't start)
Deployment is not ready: ingress-ns/ingress-nginx-controller. 0 out of 1 expected pods are ready
- 9/14/2022 - https://github.com/openservicemesh/osm/actions/runs/3056022379/jobs/4929818398 (tcp server first)
TCP server-first traffic [It]
/home/runner/work/osm/osm/tests/e2e/e2e_tcp_server_first_test.go:27
Timed out after 5.002s.
Didn't get expected response from server
Expected
<string>:
to contain substring
<string>:
y
Affected area (please mark with X where applicable):
- Install [ ]
- SMI Traffic Access Policy [ ]
- SMI Traffic Specs Policy [ ]
- SMI Traffic Split Policy [ ]
- Permissive Traffic Policy [ ]
- Ingress [ ]
- Egress [ ]
- Envoy Control Plane [ ]
- CLI Tool [ ]
- Metrics [ ]
- Certificate Management [ ]
- Sidecar Injection [ ]
- Logging [ ]
- Debugging [ ]
- Tests [ ]
- Demo [ ]
- CI System [ ]
Expected behavior:
Steps to reproduce the bug (as precisely as possible):
How was OSM installed?:
Anything else we need to know?:
Bug report archive:
Environment:
- OSM version (use
osm version): - Kubernetes version (use
kubectl version): - Size of cluster (number of worker nodes in the cluster):
- Others:
see pods time out again:
[debug] Deployment is not ready: osm-system/osm-bootstrap. 0 out of 1 expected pods are ready
https://github.com/openservicemesh/osm/actions/runs/3161948366/jobs/5148127884
see pods time out again:
[debug] Deployment is not ready: osm-system/osm-bootstrap. 0 out of 1 expected pods are readyhttps://github.com/openservicemesh/osm/actions/runs/3161948366/jobs/5148127884
was a flake introduced by a change and should be resolved by #5191
I saw the kafka consensus flake again on https://github.com/openservicemesh/osm/actions/runs/3194154634/jobs/5213475797
kafka consensus: https://github.com/openservicemesh/osm/actions/runs/3200435451/attempts/1
One thing I might try is throwing the Kafka test into its own bucket to see if the flakes continue
This issue will be closed due to a long period of inactivity. If you would like this issue to remain open then please comment or update.
Issue closed due to inactivity.