antrea
antrea copied to clipboard
TestConnectivity/testOVSFlowReplay flakiness
trafficstars
Describe the bug I observed this e2e test failing during an execution of the E2e tests on a Kind cluster on Linux with Antrea-native policies disabled" Kind job.
2022-04-14T23:04:20.2313389Z === RUN TestConnectivity/testOVSFlowReplay
2022-04-14T23:04:20.2316656Z connectivity_test.go:379: Creating 2 busybox test Pods on 'kind-worker'
2022-04-14T23:04:20.2468806Z connectivity_test.go:75: Waiting for Pods to be ready and retrieving IPs
2022-04-14T23:04:35.2520694Z connectivity_test.go:89: Retrieved all Pod IPs: map[test-pod-0-tdqpoygz:IPv4(10.10.1.9),IPstrings(10.10.1.9) test-pod-1-l3k7mx0w:IPv4(10.10.1.8),IPstrings(10.10.1.8)]
2022-04-14T23:04:35.2521416Z connectivity_test.go:98: Ping mesh test between all Pods
2022-04-14T23:04:39.3061520Z connectivity_test.go:115: Ping 'antrea-test/test-pod-0-tdqpoygz' -> 'antrea-test/test-pod-1-l3k7mx0w': OK
2022-04-14T23:04:43.3647605Z connectivity_test.go:115: Ping 'antrea-test/test-pod-1-l3k7mx0w' -> 'antrea-test/test-pod-0-tdqpoygz': OK
2022-04-14T23:04:43.3680521Z connectivity_test.go:395: The Antrea Pod for Node 'kind-worker' is 'antrea-agent-8gkrf'
2022-04-14T23:04:43.4281662Z connectivity_test.go:404: Counted 108 flow in OVS bridge 'br-int' for Node 'kind-worker'
2022-04-14T23:04:43.4965946Z connectivity_test.go:414: Counted 6 group in OVS bridge 'br-int' for Node 'kind-worker'
2022-04-14T23:04:43.4966684Z connectivity_test.go:422: Deleting flows / groups and restarting OVS daemons on Node 'kind-worker'
2022-04-14T23:04:43.9287198Z connectivity_test.go:446: Restarted OVS with ovs-ctl: stdout: * Saving flows
2022-04-14T23:04:43.9287975Z * Exiting ovsdb-server (45)
2022-04-14T23:04:43.9288362Z * Starting ovsdb-server
2022-04-14T23:04:43.9288746Z * Configuring Open vSwitch system IDs
2022-04-14T23:04:43.9289137Z * Exiting ovs-vswitchd (97)
2022-04-14T23:04:43.9289508Z * Starting ovs-vswitchd
2022-04-14T23:04:43.9289820Z * Restoring saved flows
2022-04-14T23:04:43.9290140Z * Enabling remote OVSDB managers
2022-04-14T23:04:43.9291145Z - stderr: 2022-04-14T23:04:43Z|00001|vconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: version negotiation failed (we support version 0x05, peer supports versions 0x01, 0x04)
2022-04-14T23:04:43.9291829Z ovs-ofctl: br-int: failed to connect to socket (Broken pipe)
2022-04-14T23:04:43.9292765Z 2022-04-14T23:04:43Z|00001|vconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: version negotiation failed (we support version 0x05, peer supports versions 0x01, 0x04)
2022-04-14T23:04:43.9293406Z ovs-ofctl: br-int: failed to connect to socket (Broken pipe)
2022-04-14T23:04:43.9294349Z 2022-04-14T23:04:43Z|00001|vconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: version negotiation failed (we support version 0x05, peer supports versions 0x01, 0x04)
2022-04-14T23:04:43.9295016Z ovs-ofctl: br-int: failed to connect to socket (Protocol error)
2022-04-14T23:04:43.9296066Z 2022-04-14T23:04:43Z|00001|vconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: version negotiation failed (we support version 0x05, peer supports versions 0x01, 0x04)
2022-04-14T23:04:43.9296701Z ovs-ofctl: br-int: failed to connect to socket (Broken pipe)
2022-04-14T23:04:43.9297227Z connectivity_test.go:451: Running second ping mesh to check that flows have been restored
2022-04-14T23:04:43.9299420Z connectivity_test.go:75: Waiting for Pods to be ready and retrieving IPs
2022-04-14T23:04:45.9359264Z connectivity_test.go:89: Retrieved all Pod IPs: map[test-pod-0-tdqpoygz:IPv4(10.10.1.9),IPstrings(10.10.1.9) test-pod-1-l3k7mx0w:IPv4(10.10.1.8),IPstrings(10.10.1.8)]
2022-04-14T23:04:45.9359892Z connectivity_test.go:98: Ping mesh test between all Pods
2022-04-14T23:04:49.9959168Z connectivity_test.go:115: Ping 'antrea-test/test-pod-0-tdqpoygz' -> 'antrea-test/test-pod-1-l3k7mx0w': OK
2022-04-14T23:04:54.0443096Z connectivity_test.go:115: Ping 'antrea-test/test-pod-1-l3k7mx0w' -> 'antrea-test/test-pod-0-tdqpoygz': OK
2022-04-14T23:04:54.1076331Z connectivity_test.go:404: Counted 103 flow in OVS bridge 'br-int' for Node 'kind-worker'
2022-04-14T23:04:54.1844760Z connectivity_test.go:414: Counted 6 group in OVS bridge 'br-int' for Node 'kind-worker'
2022-04-14T23:04:54.1845241Z connectivity_test.go:455:
2022-04-14T23:04:54.1845646Z Error Trace: connectivity_test.go:455
2022-04-14T23:04:54.1846236Z connectivity_test.go:66
2022-04-14T23:04:54.1846648Z Error: Not equal:
2022-04-14T23:04:54.1847009Z expected: 108
2022-04-14T23:04:54.1847435Z actual : 103
2022-04-14T23:04:54.1847853Z Test: TestConnectivity/testOVSFlowReplay
2022-04-14T23:04:54.1848330Z Messages: Mismatch in OVS flow count after flow replay
2022-04-14T23:04:54.1848863Z fixtures.go:407: Deleting Pod 'test-pod-1-l3k7mx0w'
2022-04-14T23:04:54.1875728Z fixtures.go:407: Deleting Pod 'test-pod-0-tdqpoygz'
Note that the OVS error logs are harmless and can be ignored. However, there is a flow count mismatch after the replay. It is hard to troubleshoot the test with the current log levels. I propose to dump the flows in case of test failure: if the test fails again it will be easy to troubleshoot and fix the test to avoid future flakiness.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days