osm icon indicating copy to clipboard operation
osm copied to clipboard

E2E tests are flaking

Open jsturtevant opened this issue 3 years ago • 5 comments

Bug description:

I noticed some flakes where random things failing. I am not sure if these are all related and after looking at logs and commits in the last few days I don't see any thing jumping out as the issue.

  • 9/16/2022 - https://github.com/openservicemesh/osm/actions/runs/3069568739/jobs/4958431162 (kafka doesn't reach concensus)
Using config: /opt/bitnami/zookeeper/bin/../conf/zoo.cfg
> (kafka-zookeeper-0) ZK status check failed: expected nil err, got command terminated with exit code 1
> (kafka-zookeeper-1) ZK status check succeeded!
> (kafka-zookeeper-2) ZK status check succeeded!
> (kafka-zookeeper-0) Stdout /opt/bitnami/java/bin/java
Client port found: 2181. Client address: localhost. Client SSL: false.
  • 9/16/2022 - https://github.com/openservicemesh/osm/actions/runs/3069634707/jobs/4958564544#step:5:399 (nginx doesn't start)
Deployment is not ready: ingress-ns/ingress-nginx-controller. 0 out of 1 expected pods are ready
  • 9/14/2022 - https://github.com/openservicemesh/osm/actions/runs/3056022379/jobs/4929818398 (tcp server first)
  TCP server-first traffic [It]
  /home/runner/work/osm/osm/tests/e2e/e2e_tcp_server_first_test.go:27

  Timed out after 5.002s.
  Didn't get expected response from server
  Expected
      <string>: 
  to contain substring
      <string>: 
      y

Affected area (please mark with X where applicable):

  • Install [ ]
  • SMI Traffic Access Policy [ ]
  • SMI Traffic Specs Policy [ ]
  • SMI Traffic Split Policy [ ]
  • Permissive Traffic Policy [ ]
  • Ingress [ ]
  • Egress [ ]
  • Envoy Control Plane [ ]
  • CLI Tool [ ]
  • Metrics [ ]
  • Certificate Management [ ]
  • Sidecar Injection [ ]
  • Logging [ ]
  • Debugging [ ]
  • Tests [ ]
  • Demo [ ]
  • CI System [ ]

Expected behavior:

Steps to reproduce the bug (as precisely as possible):

How was OSM installed?:

Anything else we need to know?:

Bug report archive:

Environment:

  • OSM version (use osm version):
  • Kubernetes version (use kubectl version):
  • Size of cluster (number of worker nodes in the cluster):
  • Others:

jsturtevant avatar Sep 16 '22 18:09 jsturtevant

see pods time out again:

[debug] Deployment is not ready: osm-system/osm-bootstrap. 0 out of 1 expected pods are ready

https://github.com/openservicemesh/osm/actions/runs/3161948366/jobs/5148127884

jsturtevant avatar Sep 30 '22 22:09 jsturtevant

see pods time out again:

[debug] Deployment is not ready: osm-system/osm-bootstrap. 0 out of 1 expected pods are ready

https://github.com/openservicemesh/osm/actions/runs/3161948366/jobs/5148127884

was a flake introduced by a change and should be resolved by #5191

jsturtevant avatar Oct 06 '22 03:10 jsturtevant

I saw the kafka consensus flake again on https://github.com/openservicemesh/osm/actions/runs/3194154634/jobs/5213475797

jsturtevant avatar Oct 06 '22 03:10 jsturtevant

kafka consensus: https://github.com/openservicemesh/osm/actions/runs/3200435451/attempts/1

jsturtevant avatar Oct 06 '22 22:10 jsturtevant

One thing I might try is throwing the Kafka test into its own bucket to see if the flakes continue

keithmattix avatar Oct 07 '22 03:10 keithmattix

This issue will be closed due to a long period of inactivity. If you would like this issue to remain open then please comment or update.

github-actions[bot] avatar Dec 07 '22 00:12 github-actions[bot]

Issue closed due to inactivity.

github-actions[bot] avatar Dec 15 '22 00:12 github-actions[bot]