gloo
gloo copied to clipboard
k8s regression test (istio) flake
Which tests failed?
Failed test was [It] with matching port and target port in /home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:113
Initial Investigation
State never moves from "Pending" to "Accepted":
[FAILED] Timed out after 30.001s.
Expected
<string>: Status
to match fields: {
.State:
Expected
<core.Status_State>: 0
to equal
<core.Status_State>: 1
}
Additional Information
failure https://github.com/solo-io/gloo/actions/runs/5798585599/attempts/1
Seen in this run
Hit this one again: https://github.com/solo-io/gloo/actions/runs/7410687254/job/20163550838?pr=9030
Passed on the 4th try. Seems to be failing on a different line though:
[FAILED] [30.019 seconds]
Gloo + Istio integration tests Istio mTLS [BeforeEach] strict peer auth when mtls is enabled for the upstream should make a request with the expected cert header
[BeforeEach] /home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:315
[It] /home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:479
[FAILED] Timed out after 30.000s.
Expected
<string>: Status
to match fields: {
.State:
Expected
<core.Status_State>: 0
to equal
<core.Status_State>: 1
}
In [BeforeEach] at: /home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:323 @ 01/04/24 14:29:52.806
Full Stack Trace
github.com/solo-io/gloo/test/kube2e/istio_test.glob..func1.3.1()
/home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:323 +0x245
------------------------------
SSS
Summarizing 1 Failure:
[FAIL] Gloo + Istio integration tests Istio mTLS [BeforeEach] strict peer auth when mtls is enabled for the upstream should make a request with the expected cert header
/home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:323
Ran 7 of 10 Specs in 139.834 seconds
FAIL! -- 6 Passed | 1 Failed | 0 Pending | 3 Skipped
--- FAIL: TestIstio (139.85s)
FAIL
The only error in the logs is coming from the eds watcher. In the istio integration test, it fails the status check when this error happens on the control plane due to the upstream not being picked up when the eds plugin is run:
{"level":"error","ts":"2024-03-11T23:07:00.229Z","logger":"gloo.v1.event_loop.setup.v1.event_loop.syncer.kubernetes_eds","caller":"kubernetes/eds.go:215","msg":"upstream gloo-system.gloo-system-testserver-1234: port 1234 not found for service testserver","version":"1.0.0-ci","stacktrace":"github.com/solo-io/gloo/projects/gloo/pkg/plugins/kubernetes.(*edsWatcher).List\n\t/Users/ninapolshakova/solo/gloo/projects/gloo/pkg/plugins/kubernetes/eds.go:215\ngithub.com/solo-io/gloo/projects/gloo/pkg/plugins/kubernetes.(*edsWatcher).watch.func1\n\t/Users/ninapolshakova/solo/gloo/projects/gloo/pkg/plugins/kubernetes/eds.go:236\ngithub.com/solo-io/gloo/projects/gloo/pkg/plugins/kubernetes.(*edsWatcher).watch.func2\n\t/Users/ninapolshakova/solo/gloo/projects/gloo/pkg/plugins/kubernetes/eds.go:263"}
{"level":"error","ts":"2024-03-11T23:07:00.229Z","logger":"gloo.v1.event_loop.setup.v1.event_loop.syncer.kubernetes_eds","caller":"kubernetes/eds.go:215","msg":"upstream gloo-system.kube-svc:gloo-system-testserver-1234: port 1234 not found for service testserver","version":"1.0.0-ci","stacktrace":"github.com/solo-io/gloo/projects/gloo/pkg/plugins/kubernetes.(*edsWatcher).List\n\t/Users/ninapolshakova/solo/gloo/projects/gloo/pkg/plugins/kubernetes/eds.go:215\ngithub.com/solo-io/gloo/projects/gloo/pkg/plugins/kubernetes.(*edsWatcher).watch.func1\n\t/Users/ninapolshakova/solo/gloo/projects/gloo/pkg/plugins/kubernetes/eds.go:236\ngithub.com/solo-io/gloo/projects/gloo/pkg/plugins/kubernetes.(*edsWatcher).watch.func2\n\t/Users/ninapolshakova/solo/gloo/projects/gloo/pkg/plugins/kubernetes/eds.go:263"}
You can reproduce this by focusing the port settings istio regression tests, running with --unit-it-fails and preventing clean up by adding a check to CurrentSpecReport().Failed() to avoid clean up in the AfterEach().
This results in the VirtualService being stuck in an empty status state even though the other resources are valid:
apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
creationTimestamp: "2024-03-11T23:07:01Z"
generation: 1
name: testserver
namespace: gloo-system
resourceVersion: "20441"
uid: 5627916c-2aea-4681-8f91-9dc372552de2
spec:
virtualHost:
domains:
- testserver
routes:
- matchers:
- prefix: /
routeAction:
single:
upstream:
name: gloo-system-testserver-1234
namespace: gloo-system
status:
statuses: {}
---
apiVersion: gloo.solo.io/v1
kind: Upstream
metadata:
creationTimestamp: "2024-03-11T23:07:00Z"
generation: 2
labels:
discovered_by: kubernetesplugin
name: gloo-system-testserver-1234
namespace: gloo-system
resourceVersion: "20440"
uid: a3df734a-1b6a-40c2-8530-40b22b612a0f
spec:
discoveryMetadata:
labels:
gloo: testserver
kube:
selector:
gloo: testserver
serviceName: testserver
serviceNamespace: gloo-system
servicePort: 1234
status:
statuses:
gloo-system:
reportedBy: gloo
state: Accepted
---
apiVersion: v1
kind: Endpoints
metadata:
annotations:
endpoints.kubernetes.io/last-change-trigger-time: "2024-03-11T23:07:00Z"
creationTimestamp: "2024-03-11T23:07:00Z"
labels:
gloo: testserver
name: testserver
namespace: gloo-system
resourceVersion: "20436"
uid: 67c66447-a918-413b-987c-db9c2d5d27ed
subsets:
- addresses:
- ip: 10.244.0.12
nodeName: solo-test-cluster-control-plane
targetRef:
kind: Pod
name: testserver
namespace: gloo-system
uid: 4fa124e7-50ed-42f9-b1c2-add228af06f5
ports:
- name: http
port: 1234
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
creationTimestamp: "2024-03-11T23:07:00Z"
labels:
gloo: testserver
name: testserver
namespace: gloo-system
resourceVersion: "20435"
uid: 276417f0-3811-45c5-9882-caaef3389c18
spec:
clusterIP: 10.96.244.47
clusterIPs:
- 10.96.244.47
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- name: http
port: 1234
protocol: TCP
targetPort: 1234
selector:
gloo: testserver
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
---
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2024-03-11T20:41:05Z"
labels:
gloo: testserver
name: testserver
namespace: gloo-system
resourceVersion: "1200"
uid: 4fa124e7-50ed-42f9-b1c2-add228af06f5
spec:
containers:
- image: quay.io/solo-io/testrunner:v1.7.0-beta17
imagePullPolicy: IfNotPresent
name: testserver
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-9sktq
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: solo-test-cluster-control-plane
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 0
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: kube-api-access-9sktq
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2024-03-11T20:41:05Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2024-03-11T20:41:09Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2024-03-11T20:41:09Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2024-03-11T20:41:05Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://ed8459f4f66adc416565c43073f0a3db795657fa83dc815a418c442c14545c59
image: quay.io/solo-io/testrunner:v1.7.0-beta17
imageID: quay.io/solo-io/testrunner@sha256:8dbf8d9a4c499d4f54cf009a0862d9f62eb40429b731958bd0f644f18fed1d4b
lastState: {}
name: testserver
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2024-03-11T20:41:09Z"
hostIP: 172.18.0.2
phase: Running
podIP: 10.244.0.12
podIPs:
- ip: 10.244.0.12
qosClass: BestEffort
startTime: "2024-03-11T20:41:05Z"
These tests suites were migrated to our new format, and the legacy tests have/will be removed by Nina. Marking as closed
Observed here on #9805 targeting 1.16
• [FAILED] [34.247 seconds]
Gloo + Istio integration tests port settings should act as expected with varied ports [It] without target port, and port matching pod's port
/home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:181
[FAILED] Timed out after 30.001s.
Expected
<string>: Status
to match fields: {
.State:
Expected
<core.Status_State>: 0
to equal
<core.Status_State>: 1
}
In [It] at: /home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:123 @ 07/23/24 15:01:56.029
Full Stack Trace
github.com/solo-io/gloo/test/kube2e/istio_test.glob..func1.1.3(0x4d2, 0xffffffffffffffff)
/home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:123 +0x886
github.com/solo-io/gloo/test/kube2e/istio_test.glob..func1.1.4(0x1e9c6d9?, 0x1ea0f51?, 0x1ea0f89?)
/home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:164 +0x69
reflect.Value.call({0x67c5860?, 0xc0004e53e0?, 0x13?}, {0x6f32655, 0x4}, {0xc000964b90, 0x3, 0x3?})
/opt/hostedtoolcache/go/1.21.11/x64/src/reflect/value.go:596 +0x14ce
reflect.Value.Call({0x67c5860?, 0xc0004e53e0?, 0x7e5fcd0?}, {0xc000964b90, 0x3, 0x3})
/opt/hostedtoolcache/go/1.21.11/x64/src/reflect/value.go:380 +0xb6
------------------------------
SSSSSSSSS
Summarizing 1 Failure:
[FAIL] Gloo + Istio integration tests port settings should act as expected with varied ports [It] without target port, and port matching pod's port
/home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:123
Ran 1 of 10 Specs in 112.084 seconds
FAIL! -- 0 Passed | 1 Failed | 0 Pending | 9 Skipped