etcd icon indicating copy to clipboard operation
etcd copied to clipboard

[test] TestBalancerUnderNetworkPartitionDelete failure

Open ahrtr opened this issue 3 years ago • 1 comments

Raised on the latest code on main branch.

=== FAIL: integration/clientv3/connectivity TestBalancerUnderNetworkPartitionDelete (8.12s)
    cluster.go:532: Creating listener with addr: 127.0.0.1:2109805817
......
    logger.go:130: 2022-08-10T22:44:54.878Z	WARN	m1	request stats	{"member": "m1", "start time": "2022-08-10T22:44:53.878Z", "time spent": "1.000107558s", "remote": "@", "response type": "/etcdserverpb.KV/DeleteRange", "request count": 0, "request size": 3, "response count": 0, "response size": 0, "request content": "key:\"a\" "}
    logger.go:130: 2022-08-10T22:44:54.879Z	WARN	client	retrying of unary invoker failed	{"target": "etcd-endpoints://0xc000550f00/localhost:m0", "method": "/etcdserverpb.KV/DeleteRange", "attempt": 0, "error": "rpc error: code = Unavailable desc = etcdserver: request timed out, possibly due to previous leader failure"}
    network_partition_test.go:137: Op returned error: etcdserver: request timed out, possibly due to previous leader failure
    network_partition_test.go:138: Cancelling...
    network_partition_test.go:144: #0: expected 'expected error', got 'etcdserver: request timed out, possibly due to previous leader failure'
    logger.go:130: 2022-08-10T22:44:54.993Z	INFO	m0.raft	c16a4db1d4d2aea3 is starting a new election at term 8	{"member": "m0"}
......
    logger.go:130: 2022-08-10T22:44:56.922Z	INFO	m0.raft	c16a4db1d4d2aea3 became candidate at term 21	{"member": "m0"}
Saved JUnit XML test report to /home/runner/work/etcd/etcd/linux-amd64-integration-4-cpu/junit_MTY2MDE3MTMxMwo.xml
FAIL: 'integration' failed at Wed Aug 10 22:48:23 UTC 2022
......
     logger.go:130: 2022-08-10T22:44:59.879Z	DEBUG	client	retrying of unary invoker	{"target": "etcd-endpoints://0xc000550f00/localhost:m0", "method": "/etcdserverpb.KV/DeleteRange", "attempt": 0}
    network_partition_test.go:137: Op returned error: <nil>
    network_partition_test.go:138: Cancelling...
    logger.go:130: 2022-08-10T22:44:59.880Z	INFO	grpc	[[core] [Channel #329] Channel Connectivity change to SHUTDOWN]
......
    logger.go:130: 2022-08-10T22:44:59.889Z	INFO	m1	terminated a member	{"member": "m1", "name": "m1", "advertise-peer-urls": ["unix://127.0.0.1:2110105817"], "listen-client-urls": ["unix://127.0.0.1:2110205817"], "grpc-url": "unix://localhost:m1"}
    cluster.go:1392: ========= Cluster termination succeeded ===================

DONE 519 tests, 2 skipped, 1 failure in 1.199s
Error: Process completed with exit code 255.

Refer to https://github.com/etcd-io/etcd/runs/7777195728?check_suite_focus=true

ahrtr avatar Aug 11 '22 00:08 ahrtr

after digging into this a bit, I assume this is due to:

2022-08-10T22:48:23.0647849Z network_partition_test.go:144: #0: expected 'expected error', got 'etcdserver: request timed out, possibly due to previous leader failure'

So in: https://github.com/etcd-io/etcd/blob/a1fb9ff1e4de40337735d07ca0773cfc242ad00f/tests/integration/clientv3/connectivity/network_partition_test.go#L52-L55

This isn't caught by the IsClientTimeout. Shall this error be added in there as a transient error or does this indicate another issue?

tjungblu avatar Aug 24 '22 11:08 tjungblu

It seems that the issue has been fixed by #14377

fuweid avatar Oct 06 '22 04:10 fuweid