osm E2E tests are flaking

Bug description:

I noticed some flakes where random things failing. I am not sure if these are all related and after looking at logs and commits in the last few days I don't see any thing jumping out as the issue.

9/16/2022 - https://github.com/openservicemesh/osm/actions/runs/3069568739/jobs/4958431162 (kafka doesn't reach concensus)

Using config: /opt/bitnami/zookeeper/bin/../conf/zoo.cfg
> (kafka-zookeeper-0) ZK status check failed: expected nil err, got command terminated with exit code 1
> (kafka-zookeeper-1) ZK status check succeeded!
> (kafka-zookeeper-2) ZK status check succeeded!
> (kafka-zookeeper-0) Stdout /opt/bitnami/java/bin/java
Client port found: 2181. Client address: localhost. Client SSL: false.

9/16/2022 - https://github.com/openservicemesh/osm/actions/runs/3069634707/jobs/4958564544#step:5:399 (nginx doesn't start)

Deployment is not ready: ingress-ns/ingress-nginx-controller. 0 out of 1 expected pods are ready

9/14/2022 - https://github.com/openservicemesh/osm/actions/runs/3056022379/jobs/4929818398 (tcp server first)

  TCP server-first traffic [It]
  /home/runner/work/osm/osm/tests/e2e/e2e_tcp_server_first_test.go:27

  Timed out after 5.002s.
  Didn't get expected response from server
  Expected
      <string>: 
  to contain substring
      <string>: 
      y

Affected area (please mark with X where applicable):

Install [ ]
SMI Traffic Access Policy [ ]
SMI Traffic Specs Policy [ ]
SMI Traffic Split Policy [ ]
Permissive Traffic Policy [ ]
Ingress [ ]
Egress [ ]
Envoy Control Plane [ ]
CLI Tool [ ]
Metrics [ ]
Certificate Management [ ]
Sidecar Injection [ ]
Logging [ ]
Debugging [ ]
Tests [ ]
Demo [ ]
CI System [ ]

Expected behavior:

Steps to reproduce the bug (as precisely as possible):

How was OSM installed?:

Anything else we need to know?:

Bug report archive:

Environment:

OSM version (use osm version):
Kubernetes version (use kubectl version):
Size of cluster (number of worker nodes in the cluster):
Others:

Sep 16 '22 18:09 jsturtevant

see pods time out again:

[debug] Deployment is not ready: osm-system/osm-bootstrap. 0 out of 1 expected pods are ready

https://github.com/openservicemesh/osm/actions/runs/3161948366/jobs/5148127884

Sep 30 '22 22:09 jsturtevant

see pods time out again:
[debug] Deployment is not ready: osm-system/osm-bootstrap. 0 out of 1 expected pods are ready
https://github.com/openservicemesh/osm/actions/runs/3161948366/jobs/5148127884

was a flake introduced by a change and should be resolved by #5191

Oct 06 '22 03:10 jsturtevant

I saw the kafka consensus flake again on https://github.com/openservicemesh/osm/actions/runs/3194154634/jobs/5213475797

Oct 06 '22 03:10 jsturtevant

kafka consensus: https://github.com/openservicemesh/osm/actions/runs/3200435451/attempts/1

Oct 06 '22 22:10 jsturtevant

One thing I might try is throwing the Kafka test into its own bucket to see if the flakes continue

Oct 07 '22 03:10 keithmattix

This issue will be closed due to a long period of inactivity. If you would like this issue to remain open then please comment or update.

Dec 07 '22 00:12 github-actions[bot]

Issue closed due to inactivity.

Dec 15 '22 00:12 github-actions[bot]