cockroach
cockroach copied to clipboard
roachtest: replicate/wide failed
roachtest.replicate/wide failed with artifacts on master @ 6300c3c3367ad46ac48bf24915cf0d73cae446a0:
(allocator.go:359).func1: dial tcp 20.102.91.133:26257: connect: connection refused
test artifacts and logs in: /artifacts/replicate/wide/cpu_arch=arm64/run_1
Parameters:
-
ROACHTEST_arch=arm64
-
ROACHTEST_cloud=azure
-
ROACHTEST_coverageBuild=false
-
ROACHTEST_cpu=1
-
ROACHTEST_encrypted=false
-
ROACHTEST_fs=ext4
-
ROACHTEST_localSSD=true
-
ROACHTEST_metamorphicBuild=false
-
ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
Grafana is not yet available for azure clusters
This test on roachdash | Improve this report!
Jira issue: CRDB-38768
Node 1 reports slow heartbeats, and terminates after detecting disk stall:
F240515 12:54:52.692794 150690 1@util/log/file.go:269 â‹® [-] 265 disk stall detected: unable to sync log files within 20s
10s later, the roachtest could not connect to the node:
12:55:01 test_impl.go:414: test failure #1: full stack retained in failure_1.log: (allocator.go:359).func1: dial tcp 20.102.91.133:26257: connect: connection refused
OTOH, the node 1 is asked to be decommissioned in this test, prior to the disk stall. I wonder if these two events correlate.
12:53:52 cluster.go:2371: running cmd `./cockroach node decommissi...` on nodes [:1]; details in run_125352.624956031_n1_cockroach-node-decom.log
12:53:53 allocator.go:403: 68 mis-replicated ranges
12:53:54 allocator.go:403: 47 mis-replicated ranges
12:53:55 allocator.go:403: 10 mis-replicated ranges
12:53:56 allocator.go:403: 0 mis-replicated ranges
12:53:57 allocator.go:374: SET CLUSTER SETTING server.time_until_store_dead = '90s'
F240515 12:54:52.692794 150690 1@util/log/file.go:269 â‹® [-] 265 disk stall detected: unable to sync log files within 20s
12:55:01 test_impl.go:414: test failure #1: full stack retained in failure_1.log: (allocator.go:359).func1: dial tcp 20.102.91.133:26257: connect: connection refused
Noting that this is on Azure:
ROACHTEST_cloud=azure
Which has had a greater degree of instability in roachtests recently. I'd be pro chalking up to an infra flake.