cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

roachtest: tpcc/mixed-headroom/n5cpu16 failed

Open cockroach-teamcity opened this issue 3 years ago • 21 comments

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 8d34ef1ea15850ee1c70470610b6652df4c317de:

		  |  1766.0s        0            6.0           18.5  28991.0  34359.7  34359.7  34359.7 stockLevel
		  |  1767.0s        0           11.0           18.6  38654.7  38654.7  40802.2  40802.2 delivery
		  |  1767.0s        0           41.1          186.3  33286.0  40802.2  40802.2  40802.2 newOrder
		  |  1767.0s        0            0.0           18.6      0.0      0.0      0.0      0.0 orderStatus
		  |  1767.0s        0           14.0          186.1  32212.3  33286.0  42949.7  42949.7 payment
		  |  1767.0s        0            2.0           18.5  27917.3  28991.0  28991.0  28991.0 stockLevel
		  |  1768.0s        0            3.0           18.6  40802.2  40802.2  40802.2  40802.2 delivery
		  |  1768.0s        0           51.9          186.2  38654.7  42949.7  45097.2  47244.6 newOrder
		  |  1768.0s        0            1.0           18.6  38654.7  38654.7  38654.7  38654.7 orderStatus
		  |  1768.0s        0            7.0          186.0  28991.0  40802.2  40802.2  40802.2 payment
		  |  1768.0s        0            1.0           18.5  36507.2  36507.2  36507.2  36507.2 stockLevel
		Wraps: (8) COMMAND_PROBLEM
		Wraps: (9) Node 5. Command with error:
		  | ``````
		  | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=2112 --pprofport=33333  {pgurl:1-4}
		  | ``````
		Wraps: (10) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError

	mixed_version_jobs.go:73,versionupgrade.go:178,tpcc.go:427,test_runner.go:897: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:178
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:427
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:897
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	main/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	GOROOT/src/runtime/proc.go:6498
		  | runtime.main
		  | 	GOROOT/src/runtime/proc.go:238
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

  • #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-bulkio branch-release-21.2]
/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

Jira issue: CRDB-16849

Epic CRDB-19172

cockroach-teamcity avatar Jun 19 '22 15:06 cockroach-teamcity

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 8d34ef1ea15850ee1c70470610b6652df4c317de:

		  |   664.0s        0            1.0           18.3  13421.8  13421.8  13421.8  13421.8 stockLevel
		  |   665.0s        0            0.0           18.4      0.0      0.0      0.0      0.0 delivery
		  |   665.0s        0            4.0          184.8  17179.9  19327.4  19327.4  19327.4 newOrder
		  |   665.0s        0            0.0           18.4      0.0      0.0      0.0      0.0 orderStatus
		  |   665.0s        0            0.0          182.9      0.0      0.0      0.0      0.0 payment
		  |   665.0s        0            0.0           18.3      0.0      0.0      0.0      0.0 stockLevel
		  |   666.0s        0            0.0           18.3      0.0      0.0      0.0      0.0 delivery
		  |   666.0s        0            0.0          184.5      0.0      0.0      0.0      0.0 newOrder
		  |   666.0s        0            0.0           18.4      0.0      0.0      0.0      0.0 orderStatus
		  |   666.0s        0            0.0          182.6      0.0      0.0      0.0      0.0 payment
		  |   666.0s        0            0.0           18.3      0.0      0.0      0.0      0.0 stockLevel
		Wraps: (8) COMMAND_PROBLEM
		Wraps: (9) Node 5. Command with error:
		  | ``````
		  | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=2112 --pprofport=33333  {pgurl:1-4}
		  | ``````
		Wraps: (10) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError

	mixed_version_jobs.go:73,versionupgrade.go:178,tpcc.go:427,test_runner.go:897: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:178
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:427
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:897
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	main/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	GOROOT/src/runtime/proc.go:6498
		  | runtime.main
		  | 	GOROOT/src/runtime/proc.go:238
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

  • #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-bulkio branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity avatar Jun 20 '22 15:06 cockroach-teamcity

node 3 OOMed (node 2 on the second failure):

[ 7780.081918] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/cockroach.service,task=cockroach,pid=2867347,uid=1000
[ 7780.082013] Out of memory: Killed process 2867347 (cockroach) total-vm:21357384kB, anon-rss:13339668kB, file-rss:1236kB, shmem-rss:0kB, UID:1000 pgtables:38656kB oom_score_adj:0
[ 7780.734170] oom_reaper: reaped process 2867347 (cockroach), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

lidorcarmel avatar Jun 21 '22 23:06 lidorcarmel

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 13cb2f6c40e3146fed8d931f65f89da9b42ce2c1:

		  |    17.0s        0            0.0            8.2      0.0      0.0      0.0      0.0 stockLevel
		  |    18.0s        0            3.0            6.9   8053.1  10737.4  10737.4  10737.4 delivery
		  |    18.0s        0           82.1           65.6  10737.4  13958.6  15569.3  18253.6 newOrder
		  |    18.0s        0            5.0            8.0   6442.5   7247.8   7247.8   7247.8 orderStatus
		  |    18.0s        0           48.1           74.9  10737.4  11811.2  12884.9  12884.9 payment
		  |    18.0s        0            0.0            7.8      0.0      0.0      0.0      0.0 stockLevel
		  |    19.0s        0            5.0            6.8  10200.5  10200.5  10200.5  10200.5 delivery
		  |    19.0s        0           22.0           63.3  11811.2  14495.5  15032.4  15032.4 newOrder
		  |    19.0s        0            0.0            7.6      0.0      0.0      0.0      0.0 orderStatus
		  |    19.0s        0           52.0           73.7  10737.4  11811.2  11811.2  12348.0 payment
		  |    19.0s        0            0.0            7.4      0.0      0.0      0.0      0.0 stockLevel
		Wraps: (8) COMMAND_PROBLEM
		Wraps: (9) Node 5. Command with error:
		  | ``````
		  | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=2112 --pprofport=33333  {pgurl:1-4}
		  | ``````
		Wraps: (10) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError

	mixed_version_jobs.go:73,versionupgrade.go:178,tpcc.go:427,test_runner.go:896: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:178
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:427
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:896
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	main/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	GOROOT/src/runtime/proc.go:6498
		  | runtime.main
		  | 	GOROOT/src/runtime/proc.go:238
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

  • #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-bulkio branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity avatar Jun 22 '22 14:06 cockroach-teamcity

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 457d724622e4fa2e62d6f4e7926509dbc7d18511:

		  |   785.0s        0            0.0           18.9      0.0      0.0      0.0      0.0 stockLevel
		  |   786.0s        0            0.0           18.8      0.0      0.0      0.0      0.0 delivery
		  |   786.0s        0            0.0          189.1      0.0      0.0      0.0      0.0 newOrder
		  |   786.0s        0            0.0           18.9      0.0      0.0      0.0      0.0 orderStatus
		  |   786.0s        0            0.0          188.5      0.0      0.0      0.0      0.0 payment
		  |   786.0s        0            0.0           18.9      0.0      0.0      0.0      0.0 stockLevel
		  |   787.0s        0            0.0           18.8      0.0      0.0      0.0      0.0 delivery
		  |   787.0s        0            0.0          188.8      0.0      0.0      0.0      0.0 newOrder
		  |   787.0s        0            0.0           18.9      0.0      0.0      0.0      0.0 orderStatus
		  |   787.0s        0            0.0          188.2      0.0      0.0      0.0      0.0 payment
		  |   787.0s        0            0.0           18.9      0.0      0.0      0.0      0.0 stockLevel
		Wraps: (8) COMMAND_PROBLEM
		Wraps: (9) Node 5. Command with error:
		  | ``````
		  | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4}
		  | ``````
		Wraps: (10) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError

	mixed_version_jobs.go:73,versionupgrade.go:188,tpcc.go:433,test_runner.go:896: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:188
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:433
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:896
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	main/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	GOROOT/src/runtime/proc.go:6498
		  | runtime.main
		  | 	GOROOT/src/runtime/proc.go:238
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

  • #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-bulkio branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity avatar Jul 21 '22 13:07 cockroach-teamcity

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 773f7d4445ce3e0e806b7a182adba70a0f270f19:

		  |   298.0s        0          168.0           93.8     31.5     44.0     48.2     60.8 newOrder
		  |   298.0s        0           13.0           10.1      6.0      9.4     10.5     10.5 orderStatus
		  |   298.0s        0          201.0          100.4     18.9     32.5     50.3     56.6 payment
		  |   298.0s        0           15.0           10.0     27.3     65.0     92.3     92.3 stockLevel
		  |   299.0s        0           12.0           10.0     58.7     62.9     62.9     62.9 delivery
		  |   299.0s        0          203.8           94.2     33.6     65.0     79.7     83.9 newOrder
		  |   299.0s        0           27.0           10.2      6.8      8.1     13.6     13.6 orderStatus
		  |   299.0s        0          214.8          100.8     21.0     50.3     67.1     75.5 payment
		  |   299.0s        0           24.0           10.0     33.6     62.9     88.1     88.1 stockLevel
		  |   300.0s        0           18.0           10.0     60.8     92.3    100.7    100.7 delivery
		  |   300.0s        0          195.1           94.5     32.5     41.9     50.3     52.4 newOrder
		  |   300.0s        0           20.0           10.2      6.0      7.1      7.1      7.1 orderStatus
		  |   300.0s        0          174.1          101.0     21.0     32.5     39.8     46.1 payment
		  |   300.0s        0           15.0           10.1     26.2     37.7     50.3     50.3 stockLevel
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |     1.0s        0           20.1           20.1     60.8     79.7    109.1    109.1 delivery
		  |     1.0s        0          211.3          211.4     35.7     52.4     62.9     67.1 newOrder
		  |     1.0s        0           26.2           26.2      6.6      8.9      8.9      8.9 orderStatus
		  |     1.0s        0          166.0          166.1     21.0     33.6     41.9     48.2 payment
		  |     1.0s        0           15.1           15.1     26.2     44.0     46.1     46.1 stockLevel
		  |     2.0s        0           20.0           20.1     58.7     67.1     79.7     79.7 delivery
		  |     2.0s        0          166.0          188.6     31.5     44.0     46.1     54.5 newOrder
		  |     2.0s        0           14.0           20.1      6.0      7.6      8.9      8.9 orderStatus
		  |     2.0s        0          214.0          190.1     19.9     28.3     33.6     33.6 payment
		  |     2.0s        0           17.0           16.1     32.5     48.2     71.3     71.3 stockLevel
		  |     3.0s        0           23.0           21.0     58.7     83.9     83.9     83.9 delivery
		  |     3.0s        0          175.1          184.1     32.5     39.8     46.1     46.1 newOrder
		  |     3.0s        0           12.0           17.4      6.3      7.6      8.1      8.1 orderStatus
		  |     3.0s        0          214.1          198.1     21.0     29.4     46.1     58.7 payment
		  |     3.0s        0           19.0           17.0     31.5     48.2     54.5     54.5 stockLevel
		  |     4.0s        0           14.0           19.3     60.8     88.1     92.3     92.3 delivery
		  |     4.0s        0          220.8          193.3     33.6     52.4     65.0     83.9 newOrder
		  |     4.0s        0           20.0           18.0      6.6      8.9     10.5     10.5 orderStatus
		  |     4.0s        0          168.9          190.8     22.0     39.8     50.3     62.9 payment
		  |     4.0s        0           22.0           18.3     33.6     46.1     62.9     62.9 stockLevel
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |     5.0s        0           15.0           18.4     79.7     96.5    117.4    117.4 delivery
		  |     5.0s        0          196.1          193.9     35.7     52.4     71.3     83.9 newOrder
		  |     5.0s        0           11.0           16.6      7.6     11.0     11.5     11.5 orderStatus
		  |     5.0s        0          164.1          185.4     23.1     46.1     65.0     65.0 payment
		  |     5.0s        0           10.0           16.6     41.9     50.3     50.3     50.3 stockLevel
		Wraps: (8) COMMAND_PROBLEM
		Wraps: (9) Node 5. Command with error:
		  | ``````
		  | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4}
		  | ``````
		Wraps: (10) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError

	versionupgrade.go:502,versionupgrade.go:188,tpcc.go:433,test_runner.go:896: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

  • #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-bulkio branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity avatar Aug 12 '22 16:08 cockroach-teamcity

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ f4042d47fa8062a612c38d4696eb6bee9cee7c21:

		  |   255.0s        0          150.9           87.0     19.9     31.5     41.9     46.1 payment
		  |   255.0s        0           19.0            8.8     25.2     48.2     52.4     52.4 stockLevel
		  |   256.0s        0           19.0            8.6     65.0     83.9     83.9     83.9 delivery
		  |   256.0s        0          196.1           80.1     35.7     46.1     50.3     56.6 newOrder
		  |   256.0s        0           18.0            8.9      7.1     11.0     15.7     15.7 orderStatus
		  |   256.0s        0          158.1           87.3     21.0     31.5     35.7     41.9 payment
		  |   256.0s        0           20.0            8.8     26.2     41.9     56.6     56.6 stockLevel
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |   257.0s        0           21.0            8.6     56.6     65.0     67.1     67.1 delivery
		  |   257.0s        0          159.0           80.4     33.6     44.0     44.0     48.2 newOrder
		  |   257.0s        0           17.0            8.9      7.6      8.9      9.4      9.4 orderStatus
		  |   257.0s        0          177.0           87.7     22.0     32.5     35.7     35.7 payment
		  |   257.0s        0           14.0            8.9     26.2     41.9     46.1     46.1 stockLevel
		  |   258.0s        0           15.0            8.6     58.7     65.0     75.5     75.5 delivery
		  |   258.0s        0          159.0           80.7     33.6     44.0     50.3     56.6 newOrder
		  |   258.0s        0           18.0            8.9      6.8      8.9      9.4      9.4 orderStatus
		  |   258.0s        0          163.0           88.0     19.9     28.3     33.6     35.7 payment
		  |   258.0s        0           17.0            8.9     25.2     46.1     46.1     46.1 stockLevel
		  |   259.0s        0           14.0            8.7     58.7     65.0     71.3     71.3 delivery
		  |   259.0s        0          154.8           81.0     33.6     41.9     46.1     54.5 newOrder
		  |   259.0s        0           18.0            9.0      6.0     10.5     13.1     13.1 orderStatus
		  |   259.0s        0          156.8           88.2     21.0     28.3     30.4     44.0 payment
		  |   259.0s        0           15.0            8.9     33.6     52.4     62.9     62.9 stockLevel
		  |   260.0s        0           15.0            8.7     62.9     79.7    151.0    151.0 delivery
		  |   260.0s        0          163.1           81.3     35.7     44.0     48.2     52.4 newOrder
		  |   260.0s        0           15.0            9.0      6.6      8.9     10.0     10.0 orderStatus
		  |   260.0s        0          164.1           88.5     21.0     29.4     33.6     35.7 payment
		  |   260.0s        0           21.0            9.0     31.5     48.2     60.8     60.8 stockLevel
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |   261.0s        0           12.0            8.7     58.7     65.0     71.3     71.3 delivery
		  |   261.0s        0          165.9           81.6     35.7     44.0     54.5     58.7 newOrder
		  |   261.0s        0           20.0            9.0      7.3      8.1     12.1     12.1 orderStatus
		  |   261.0s        0          184.9           88.9     22.0     30.4     37.7     50.3 payment
		  |   261.0s        0           30.0            9.0     26.2     46.1     48.2     48.2 stockLevel
		  |   262.0s        0           22.0            8.8     58.7     79.7     83.9     83.9 delivery
		  |   262.0s        0          159.1           81.9     35.7     48.2     54.5     54.5 newOrder
		  |   262.0s        0           19.0            9.1      6.8      8.1      8.4      8.4 orderStatus
		  |   262.0s        0          162.1           89.2     21.0     27.3     29.4     35.7 payment
		  |   262.0s        0           13.0            9.1     24.1     39.8     48.2     48.2 stockLevel
		Wraps: (8) secondary error attachment
		  | UNCLASSIFIED_PROBLEM: context canceled
		  | (1) UNCLASSIFIED_PROBLEM
		  | Wraps: (2) Node 5. Command with error:
		  |   | ``````
		  |   | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4}
		  |   | ``````
		  | Wraps: (3) context canceled
		  | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
		Wraps: (9) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) *secondary.withSecondaryError (9) *errors.errorString

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

  • #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-bulkio branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity avatar Aug 16 '22 15:08 cockroach-teamcity

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ a0d8839aa6164af81a9ebb140147d3baf5321287:

		  |    73.0s        0            4.0            3.2      8.9     10.0     10.0     10.0 orderStatus
		  |    73.0s        0           54.0           31.2     18.9     50.3     52.4     52.4 payment
		  |    73.0s        0            5.0            3.2     33.6     48.2     48.2     48.2 stockLevel
		  |    74.0s        0            4.0            3.5     71.3     88.1     88.1     88.1 delivery
		  |    74.0s        0           40.0           20.2     32.5     52.4     60.8     60.8 newOrder
		  |    74.0s        0            5.0            3.2      8.9     10.5     10.5     10.5 orderStatus
		  |    74.0s        0           48.0           31.4     16.3     30.4     37.7     37.7 payment
		  |    74.0s        0            5.0            3.2     37.7     41.9     41.9     41.9 stockLevel
		  |    75.0s        0            5.0            3.5     96.5    167.8    167.8    167.8 delivery
		  |    75.0s        0           46.0           20.6     48.2     92.3    109.1    109.1 newOrder
		  |    75.0s        0            4.0            3.2      9.4     10.5     10.5     10.5 orderStatus
		  |    75.0s        0           58.0           31.7     24.1     60.8     67.1     67.1 payment
		  |    75.0s        0            8.0            3.3     33.6     79.7     79.7     79.7 stockLevel
		  |    76.0s        0            5.0            3.5     83.9    121.6    121.6    121.6 delivery
		  |    76.0s        0           39.0           20.8     52.4     75.5     92.3     92.3 newOrder
		  |    76.0s        0            7.0            3.2      7.9     24.1     24.1     24.1 orderStatus
		  |    76.0s        0           61.0           32.1     18.9     60.8     62.9     62.9 payment
		  |    76.0s        0            5.0            3.3     33.6     41.9     41.9     41.9 stockLevel
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |    77.0s        0            5.0            3.5     71.3    104.9    104.9    104.9 delivery
		  |    77.0s        0           46.0           21.1     79.7    134.2    167.8    167.8 newOrder
		  |    77.0s        0            2.0            3.2      7.6      9.4      9.4      9.4 orderStatus
		  |    77.0s        0           55.0           32.4     37.7     75.5     88.1    109.1 payment
		  |    77.0s        0            7.0            3.4     32.5     54.5     54.5     54.5 stockLevel
		  |    78.0s        0            4.0            3.6     71.3    134.2    134.2    134.2 delivery
		  |    78.0s        0           45.0           21.4     75.5    117.4    121.6    121.6 newOrder
		  |    78.0s        0            9.0            3.3      7.6     11.0     11.0     11.0 orderStatus
		  |    78.0s        0           50.0           32.7     25.2     79.7    104.9    104.9 payment
		  |    78.0s        0            3.0            3.4     30.4     54.5     54.5     54.5 stockLevel
		  |    79.0s        0            8.0            3.6    117.4    121.6    121.6    121.6 delivery
		  |    79.0s        0           45.0           21.7     75.5    121.6    159.4    159.4 newOrder
		  |    79.0s        0            5.0            3.3      8.9     11.0     11.0     11.0 orderStatus
		  |    79.0s        0           44.0           32.8     21.0     62.9     88.1     88.1 payment
		  |    79.0s        0           10.0            3.5     27.3     35.7     35.7     35.7 stockLevel
		  |    80.0s        0            6.0            3.6    104.9    117.4    117.4    117.4 delivery
		  |    80.0s        0           55.0           22.2     67.1    121.6    130.0    159.4 newOrder
		  |    80.0s        0            4.0            3.3      7.3     12.1     12.1     12.1 orderStatus
		  |    80.0s        0           64.0           33.2     18.9     79.7    113.2    113.2 payment
		  |    80.0s        0            5.0            3.5     44.0     58.7     58.7     58.7 stockLevel
		Wraps: (8) secondary error attachment
		  | UNCLASSIFIED_PROBLEM: context canceled
		  | (1) UNCLASSIFIED_PROBLEM
		  | Wraps: (2) Node 5. Command with error:
		  |   | ``````
		  |   | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4}
		  |   | ``````
		  | Wraps: (3) context canceled
		  | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
		Wraps: (9) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) *secondary.withSecondaryError (9) *errors.errorString

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

  • #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-bulkio branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity avatar Aug 19 '22 16:08 cockroach-teamcity

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ aaf50e920ceff3c2863ab96b9e3614b8434b70a8:

		  |   285.0s        0           12.0            9.7      7.3      7.9     16.8     16.8 orderStatus
		  |   285.0s        0          181.8           96.5     19.9     41.9     48.2     65.0 payment
		  |   285.0s        0           25.0            9.6     24.1     48.2     60.8     60.8 stockLevel
		  |   286.0s        0           11.0            9.6     60.8     79.7     92.3     92.3 delivery
		  |   286.0s        0          196.2           90.0     32.5     48.2     56.6     67.1 newOrder
		  |   286.0s        0           25.0            9.7      7.1      9.4      9.4      9.4 orderStatus
		  |   286.0s        0          192.2           96.8     19.9     30.4     41.9     52.4 payment
		  |   286.0s        0           24.0            9.7     24.1     50.3     54.5     54.5 stockLevel
		  |   287.0s        0           20.0            9.6     58.7     71.3     92.3     92.3 delivery
		  |   287.0s        0          172.9           90.3     32.5     52.4     58.7     92.3 newOrder
		  |   287.0s        0           18.0            9.7      6.3      7.9      8.4      8.4 orderStatus
		  |   287.0s        0          171.9           97.1     18.9     27.3     37.7     39.8 payment
		  |   287.0s        0           21.0            9.7     24.1     54.5     58.7     58.7 stockLevel
		  |   288.0s        0           14.0            9.6     56.6     71.3     75.5     75.5 delivery
		  |   288.0s        0          193.0           90.7     32.5     46.1     56.6     60.8 newOrder
		  |   288.0s        0           18.0            9.8      6.8      8.9      9.4      9.4 orderStatus
		  |   288.0s        0          193.0           97.4     18.9     26.2     28.3     30.4 payment
		  |   288.0s        0           18.0            9.7     25.2     44.0     44.0     44.0 stockLevel
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |   289.0s        0           16.0            9.6     65.0     71.3     71.3     71.3 delivery
		  |   289.0s        0          173.0           91.0     31.5     41.9     50.3     79.7 newOrder
		  |   289.0s        0           17.0            9.8      7.6      9.4     10.5     10.5 orderStatus
		  |   289.0s        0          213.0           97.8     19.9     31.5     44.0     50.3 payment
		  |   289.0s        0           22.0            9.8     25.2     48.2     48.2     48.2 stockLevel
		  |   290.0s        0           18.0            9.7     60.8     71.3     75.5     75.5 delivery
		  |   290.0s        0          183.0           91.3     32.5     50.3     60.8     65.0 newOrder
		  |   290.0s        0           20.0            9.8      6.3      8.1      8.1      8.1 orderStatus
		  |   290.0s        0          195.0           98.1     19.9     32.5     46.1     52.4 payment
		  |   290.0s        0           10.0            9.8     27.3     44.0     44.0     44.0 stockLevel
		  |   291.0s        0           20.0            9.7     60.8     83.9     96.5     96.5 delivery
		  |   291.0s        0          172.0           91.6     31.5     46.1     54.5     58.7 newOrder
		  |   291.0s        0           22.0            9.9      7.3     12.1     12.6     12.6 orderStatus
		  |   291.0s        0          173.0           98.4     19.9     28.3     37.7     41.9 payment
		  |   291.0s        0           17.0            9.8     21.0     56.6     60.8     60.8 stockLevel
		  |   292.0s        0           20.0            9.7     60.8     75.5    109.1    109.1 delivery
		  |   292.0s        0          186.9           91.9     31.5     44.0     50.3     50.3 newOrder
		  |   292.0s        0           14.0            9.9      6.3      8.4     12.1     12.1 orderStatus
		  |   292.0s        0          188.9           98.7     19.9     29.4     33.6     37.7 payment
		  |   292.0s        0           23.0            9.8     22.0     48.2     48.2     48.2 stockLevel
		Wraps: (8) secondary error attachment
		  | UNCLASSIFIED_PROBLEM: context canceled
		  | (1) UNCLASSIFIED_PROBLEM
		  | Wraps: (2) Node 5. Command with error:
		  |   | ``````
		  |   | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4}
		  |   | ``````
		  | Wraps: (3) context canceled
		  | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
		Wraps: (9) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) *secondary.withSecondaryError (9) *errors.errorString

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

  • #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-bulkio branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity avatar Aug 22 '22 15:08 cockroach-teamcity

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 80c274877a917580af62be6eb0cd48c8c7ae9c08:

		  |    94.0s        0          169.8          184.4     16.8     23.1     26.2     27.3 payment
		  |    94.0s        0           15.0           18.3     18.9     35.7     37.7     37.7 stockLevel
		  |    95.0s        0           16.0           18.8     56.6     62.9     65.0     65.0 delivery
		  |    95.0s        0          202.0          194.8     28.3     44.0     48.2     56.6 newOrder
		  |    95.0s        0           32.0           18.6      6.8      9.4     10.0     10.0 orderStatus
		  |    95.0s        0          214.0          184.7     16.3     24.1     32.5     35.7 payment
		  |    95.0s        0           22.0           18.4     19.9     35.7     37.7     37.7 stockLevel
		  |    96.0s        0           31.0           18.9     54.5    201.3    209.7    209.7 delivery
		  |    96.0s        0          200.2          194.9     32.5    151.0    184.5    192.9 newOrder
		  |    96.0s        0           21.0           18.6      6.6     10.5     10.5     10.5 orderStatus
		  |    96.0s        0          186.2          184.8     18.9    104.9    130.0    176.2 payment
		  |    96.0s        0           20.0           18.4     15.7     41.9     46.1     46.1 stockLevel
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |    97.0s        0           25.0           19.0     52.4     62.9     83.9     83.9 delivery
		  |    97.0s        0          214.0          195.1     29.4     39.8     44.0     48.2 newOrder
		  |    97.0s        0           15.0           18.6      6.6      7.9      8.9      8.9 orderStatus
		  |    97.0s        0          190.0          184.8     17.8     26.2     28.3     31.5 payment
		  |    97.0s        0           22.0           18.4     18.9     44.0     54.5     54.5 stockLevel
		  |    98.0s        0           18.0           19.0     52.4     56.6     58.7     58.7 delivery
		  |    98.0s        0          190.9          195.0     29.4     37.7     44.0     46.1 newOrder
		  |    98.0s        0           13.0           18.5      6.0      7.9      7.9      7.9 orderStatus
		  |    98.0s        0          192.9          184.9     16.3     23.1     25.2     27.3 payment
		  |    98.0s        0           17.0           18.4     23.1     37.7     41.9     41.9 stockLevel
		  |    99.0s        0           17.0           19.0     54.5     56.6     96.5     96.5 delivery
		  |    99.0s        0          187.1          194.9     27.3     37.7     50.3     52.4 newOrder
		  |    99.0s        0           30.0           18.7      6.3      9.4     10.0     10.0 orderStatus
		  |    99.0s        0          205.1          185.1     16.3     23.1     30.4     35.7 payment
		  |    99.0s        0           13.0           18.4     19.9     37.7     44.0     44.0 stockLevel
		  |   100.0s        0           32.0           19.1     54.5     65.0     67.1     67.1 delivery
		  |   100.0s        0          196.0          195.0     29.4     44.0     58.7     58.7 newOrder
		  |   100.0s        0           16.0           18.6      5.5     10.0     12.1     12.1 orderStatus
		  |   100.0s        0          184.0          185.1     17.8     24.1     39.8     44.0 payment
		  |   100.0s        0           19.0           18.4     17.8     27.3     37.7     37.7 stockLevel
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |   101.0s        0           21.0           19.1     54.5     71.3     79.7     79.7 delivery
		  |   101.0s        0          183.0          194.8     29.4     44.0     52.4     56.6 newOrder
		  |   101.0s        0           21.0           18.7      6.3      8.1      8.4      8.4 orderStatus
		  |   101.0s        0          174.0          185.0     17.8     29.4     35.7     39.8 payment
		  |   101.0s        0           15.0           18.3     16.8     22.0     39.8     39.8 stockLevel
		Wraps: (8) secondary error attachment
		  | UNCLASSIFIED_PROBLEM: context canceled
		  | (1) UNCLASSIFIED_PROBLEM
		  | Wraps: (2) Node 5. Command with error:
		  |   | ``````
		  |   | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4}
		  |   | ``````
		  | Wraps: (3) context canceled
		  | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
		Wraps: (9) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) *secondary.withSecondaryError (9) *errors.errorString

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

  • #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-bulkio branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity avatar Aug 23 '22 15:08 cockroach-teamcity

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 524fd14da3fefcd849f44a835cc5f88f5dbdadcc:

		  |   286.0s        0          184.0           97.3     21.0     35.7     39.8     48.2 payment
		  |   286.0s        0           20.0            9.5     29.4     52.4     65.0     65.0 stockLevel
		  |   287.0s        0           13.0            9.6     60.8     62.9     65.0     65.0 delivery
		  |   287.0s        0          179.0           89.9     32.5     44.0     50.3     58.7 newOrder
		  |   287.0s        0           16.0            9.6      7.6      8.4     10.0     10.0 orderStatus
		  |   287.0s        0          183.0           97.6     19.9     30.4     37.7     41.9 payment
		  |   287.0s        0           15.0            9.6     26.2     41.9     52.4     52.4 stockLevel
		  |   288.0s        0            9.0            9.5     67.1     79.7     79.7     79.7 delivery
		  |   288.0s        0          174.0           90.2     35.7     60.8     75.5     83.9 newOrder
		  |   288.0s        0           18.0            9.7      6.3      8.9     39.8     39.8 orderStatus
		  |   288.0s        0          183.0           97.9     21.0     41.9     60.8     67.1 payment
		  |   288.0s        0           24.0            9.6     24.1     41.9     92.3     92.3 stockLevel
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |   289.0s        0           26.0            9.6     60.8     88.1    100.7    100.7 delivery
		  |   289.0s        0          170.0           90.5     32.5     44.0     56.6     62.9 newOrder
		  |   289.0s        0           19.0            9.7      6.6      9.4     10.0     10.0 orderStatus
		  |   289.0s        0          186.0           98.2     21.0     30.4     37.7     39.8 payment
		  |   289.0s        0           25.0            9.7     24.1     50.3     75.5     75.5 stockLevel
		  |   290.0s        0           22.0            9.6     60.8     67.1     75.5     75.5 delivery
		  |   290.0s        0          186.0           90.8     32.5     52.4     58.7     60.8 newOrder
		  |   290.0s        0           12.0            9.7      6.0      8.9     11.0     11.0 orderStatus
		  |   290.0s        0          191.0           98.5     19.9     28.3     39.8     41.9 payment
		  |   290.0s        0           26.0            9.7     30.4     48.2     79.7     79.7 stockLevel
		  |   291.0s        0           21.0            9.7     62.9     92.3     96.5     96.5 delivery
		  |   291.0s        0          214.8           91.2     35.7     50.3     71.3     75.5 newOrder
		  |   291.0s        0           22.0            9.7      6.6     10.0     10.5     10.5 orderStatus
		  |   291.0s        0          172.8           98.8     22.0     32.5     44.0     60.8 payment
		  |   291.0s        0           20.0            9.8     27.3     48.2     52.4     52.4 stockLevel
		  |   292.0s        0           16.0            9.7     60.8     88.1     92.3     92.3 delivery
		  |   292.0s        0          189.0           91.6     31.5     44.0     58.7     60.8 newOrder
		  |   292.0s        0           17.0            9.8      6.8     10.5     16.3     16.3 orderStatus
		  |   292.0s        0          158.0           99.0     19.9     35.7     41.9     46.1 payment
		  |   292.0s        0           16.0            9.8     27.3     50.3     54.5     54.5 stockLevel
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |   293.0s        0           27.0            9.8     60.8     88.1     96.5     96.5 delivery
		  |   293.0s        0          193.1           91.9     33.6     50.3     71.3    113.2 newOrder
		  |   293.0s        0           23.0            9.8      6.3     10.5     13.6     13.6 orderStatus
		  |   293.0s        0          198.1           99.3     22.0     39.8     56.6     65.0 payment
		  |   293.0s        0           18.0            9.8     23.1     56.6     65.0     65.0 stockLevel
		Wraps: (8) secondary error attachment
		  | UNCLASSIFIED_PROBLEM: context canceled
		  | (1) UNCLASSIFIED_PROBLEM
		  | Wraps: (2) Node 5. Command with error:
		  |   | ``````
		  |   | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4}
		  |   | ``````
		  | Wraps: (3) context canceled
		  | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
		Wraps: (9) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) *secondary.withSecondaryError (9) *errors.errorString

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

  • #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-bulkio branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity avatar Aug 25 '22 15:08 cockroach-teamcity

Artifacts are missing from all but the last one. In the last failure, we see node 1 exit with status code 1 ("unspecified failure"). I can't find anything else in the logs about why the process exited. It doesn't appear to be OOM related, but maybe I'm missing the signs. The last thing in the log is:

I220825 15:35:46.188812 48480 upgrade/upgradecluster/cluster.go:118 ⋮ [n1,intExec=‹×›,migration-mgr] 826 executing bump-cluster-version=22.1-48 on nodes n{1,2,3,4}

I'll try to reproduce using:

GCE_PROJECT=andrei-jepsen ./pkg/cmd/roachtest/roachstress.sh -c10 -u 'tpcc/mixed-headroom/n5cpu16' -- --cpu-quota=1280

nvb avatar Aug 29 '22 14:08 nvb

5 of those 10 runs failed, so this is reproducible. At least two failed due to an OOM.

nvb avatar Aug 29 '22 19:08 nvb

The OOM occurred during the bank import step of the roachtest. At that time, the node which OOMed was seeing many slow raft ready iterations and appears to have been overloaded.

However, the last heap profile doesn't show anything particularly interesting:

(pprof) top
Showing nodes accounting for 779.35MB, 90.53% of 860.82MB total
Dropped 497 nodes (cum <= 4.30MB)
Showing top 10 nodes out of 140
      flat  flat%   sum%        cum   cum%
  173.50MB 20.16% 20.16%   173.50MB 20.16%  github.com/cockroachdb/cockroach/pkg/col/coldata.(*element).setNonInlined
  142.38MB 16.54% 36.70%   142.38MB 16.54%  go.etcd.io/etcd/raft/v3/raftpb.(*Entry).Unmarshal
  137.63MB 15.99% 52.68%   137.63MB 15.99%  github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvserverpb.(*ReplicatedEvalResult_AddSSTable).Unmarshal
  128.14MB 14.89% 67.57%   128.14MB 14.89%  github.com/cockroachdb/cockroach/pkg/kv/bulk.(*kvBuf).fits
   97.50MB 11.33% 78.90%    97.50MB 11.33%  github.com/cockroachdb/cockroach/pkg/roachpb.(*Value).ensureRawBytes

nvb avatar Aug 29 '22 19:08 nvb

Could be #73376, which keeps popping up. Unfortunately we may not get around to addressing it for 23.1, but we're considering bumping the priority.

erikgrinaker avatar Aug 29 '22 19:08 erikgrinaker

I was thinking along the same lines, but I also notice a clear inflection point in the rate of failures here, so something regressed about 17 days ago. I'm going to see if a bisect will lead to greater clarity.

nvb avatar Aug 29 '22 19:08 nvb

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ e39111b2e714375faa0facc05e51e8f619a55b21:

		  |   283.0s        0          186.1           96.4     13.6     25.2     32.5     41.9 payment
		  |   283.0s        0           16.0            9.5     18.9     26.2     29.4     29.4 stockLevel
		  |   284.0s        0           14.0            9.5     50.3     62.9     65.0     65.0 delivery
		  |   284.0s        0          164.8           89.5     24.1     30.4     33.6     39.8 newOrder
		  |   284.0s        0           13.0            9.6      7.1      8.9     10.0     10.0 orderStatus
		  |   284.0s        0          182.8           96.7     13.1     16.3     22.0     24.1 payment
		  |   284.0s        0           14.0            9.5     17.8     23.1     28.3     28.3 stockLevel
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |   285.0s        0           22.0            9.6     52.4     67.1     67.1     67.1 delivery
		  |   285.0s        0          189.1           89.8     24.1     31.5     41.9     71.3 newOrder
		  |   285.0s        0           16.0            9.6      6.3      7.9      8.9      8.9 orderStatus
		  |   285.0s        0          193.1           97.0     13.6     21.0     41.9     62.9 payment
		  |   285.0s        0           19.0            9.6     17.8     22.0     25.2     25.2 stockLevel
		  |   286.0s        0           16.0            9.6     54.5     83.9     88.1     88.1 delivery
		  |   286.0s        0          170.1           90.1     25.2     52.4     60.8     67.1 newOrder
		  |   286.0s        0           14.0            9.6      6.6      8.1     14.2     14.2 orderStatus
		  |   286.0s        0          193.1           97.3     13.6     29.4     44.0     58.7 payment
		  |   286.0s        0           18.0            9.6     15.7     23.1     25.2     25.2 stockLevel
		  |   287.0s        0           11.0            9.6     54.5     62.9     65.0     65.0 delivery
		  |   287.0s        0          192.0           90.5     24.1     28.3     33.6     37.7 newOrder
		  |   287.0s        0           19.0            9.7      6.8      8.1      8.4      8.4 orderStatus
		  |   287.0s        0          176.0           97.6     13.1     15.7     21.0     31.5 payment
		  |   287.0s        0           15.0            9.6     16.8     23.1     26.2     26.2 stockLevel
		  |   288.0s        0           20.0            9.7     54.5     67.1     75.5     75.5 delivery
		  |   288.0s        0          181.1           90.8     24.1     30.4     33.6     37.7 newOrder
		  |   288.0s        0           19.0            9.7      6.8      8.9     11.0     11.0 orderStatus
		  |   288.0s        0          176.1           97.9     13.6     17.8     22.0     24.1 payment
		  |   288.0s        0           25.0            9.7     18.9     24.1     28.3     28.3 stockLevel
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |   289.0s        0           18.0            9.7     56.6     67.1     71.3     71.3 delivery
		  |   289.0s        0          173.0           91.1     25.2     37.7     58.7     65.0 newOrder
		  |   289.0s        0           17.0            9.7      6.0      8.9     13.1     13.1 orderStatus
		  |   289.0s        0          189.0           98.2     14.2     32.5     46.1     54.5 payment
		  |   289.0s        0            7.0            9.7     21.0     24.1     24.1     24.1 stockLevel
		  |   290.0s        0           21.0            9.7     56.6     71.3     75.5     75.5 delivery
		  |   290.0s        0          210.9           91.5     27.3     46.1     52.4     54.5 newOrder
		  |   290.0s        0           10.0            9.7      6.6      9.4      9.4      9.4 orderStatus
		  |   290.0s        0          207.9           98.6     15.2     35.7     44.0     52.4 payment
		  |   290.0s        0           19.0            9.7     19.9     27.3     27.3     27.3 stockLevel
		Wraps: (8) secondary error attachment
		  | UNCLASSIFIED_PROBLEM: context canceled
		  | (1) UNCLASSIFIED_PROBLEM
		  | Wraps: (2) Node 5. Command with error:
		  |   | ``````
		  |   | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4}
		  |   | ``````
		  | Wraps: (3) context canceled
		  | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
		Wraps: (9) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) *secondary.withSecondaryError (9) *errors.errorString

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

  • #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-bulkio branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity avatar Sep 08 '22 18:09 cockroach-teamcity

This has not failed with the original failure mode. However, it failed at the same time as a number of other mixed versions tests 2 days ago. Moving that investigation to Test Eng.

nvb avatar Sep 12 '22 15:09 nvb

The most recent failure seems unrelated to the other mixed versions failures, namely version/mixed/nodes=3 and version/mixed/nodes=5. (Both failed because of the recent change requiring COCKROACH_UPGRADE_TO_DEV_VERSION [1].) Also, this failure doesn't indicate any issue with the upgrade FSM. It appears to be a transient (network) error which causes the background (tpcc) workload to fail thereby failing the test. Thus, I am removing the xxx-blocker labels. Full analysis is below.

[1] https://github.com/cockroachdb/cockroach/issues/87687#issuecomment-1243866806

srosenberg avatar Sep 15 '22 01:09 srosenberg

From teardown.log, we can see that the background tpcc workload fails after ~5 minutes,

I220908 17:54:41.085738 1 workload/cli/run.go:427  [-] 1  creating load generator...
I220908 17:54:41.282881 1 workload/cli/run.go:458  [-] 2  creating load generator... done (took 197.141856ms)
I220908 17:59:31.796588 23519 workload/pgx_helpers.go:79  [-] 4  pgx logger [error]: Exec logParams=map[args:[] err:read tcp 10.142.0.10:54240 -> 10.142.0.41:26257: read: connection reset by peer pid:3623803 sql:begin time:143.851154ms]

Note, 10.142.0.41 maps to n3. Both, n1 and n3 appear to experience transient network availability issues,

for i in `seq 1 4`; do echo "n${i}"; grep "failed to connect to n" logs/$i.unredacted/cockroach.log |tail -1;done
n1
I220908 17:54:10.201923 16855 kv/kvserver/closedts/sidetransport/sender.go:795 ⋮ [n1,ctstream=4] 507  side-transport failed to connect to n4: failed to connect to n4 at ‹10.142.0.21:26257›: ‹initial connection heartbeat failed›: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.142.0.21:26257: connect: connection refused"›
n2
W220908 17:59:33.250317 669937 2@rpc/nodedialer/nodedialer.go:192 ⋮ [n2] 787  unable to connect to n1: failed to connect to n1 at ‹10.142.0.33:26257›: ‹initial connection heartbeat failed›: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.142.0.33:26257: connect: connection refused"›
n3
I220908 17:54:10.196929 13357 kv/kvserver/closedts/sidetransport/sender.go:795 ⋮ [n3,ctstream=4] 577  side-transport failed to connect to n4: unable to dial n4: ‹breaker open›
n4
I220908 17:59:33.521021 16084 kv/kvserver/closedts/sidetransport/sender.go:795 ⋮ [n4,ctstream=1] 199  side-transport failed to connect to n1: unable to dial n1: ‹breaker open›

At the time of the workload failure (17:59:31), all the nodes are in the mixed-version state, executing migration jobs. (In the test harness, this is essentially the final step tpccBackgroundStepper.wait [1].) From the node logs, we can see that active cluster version is 1000022.1-48 on n2, n4 and 1000022.1-47 on n1, n3,

for i in `seq 1 4`; do echo "n${i}"; grep "active cluster version setting" logs/$i.unredacted/cockroach.log |tail -1;done
n1
I220908 17:59:30.780410 576281 server/migration.go:149 ⋮ [n1,bump-cluster-version] 1138  active cluster version setting is now ‹1000022.1-47(fence)› (up from ‹1000022.1-46›)
n2
I220908 17:59:30.993236 666404 server/migration.go:149 ⋮ [n2,bump-cluster-version] 755  active cluster version setting is now ‹1000022.1-48› (up from ‹1000022.1-47(fence)›)
n3
I220908 17:59:30.780334 716309 server/migration.go:149 ⋮ [n3,bump-cluster-version] 732  active cluster version setting is now ‹1000022.1-47(fence)› (up from ‹1000022.1-46›)
n4
I220908 17:59:31.189051 430132 server/migration.go:149 ⋮ [n4,bump-cluster-version] 159  active cluster version setting is now ‹1000022.1-48› (up from ‹1000022.1-47(fence)›)

The workload failure induced the test failure by invoking t.Fatal [2] after the monitor detects an error (via WaitE). As every roachtest failure induces collectClusterArtifacts, we attempt to grab the logs from every node. However, as can be seen in the teardown.log, some of the logs could not be transferred successfully. Upon a closer examination, it appears that errors are swallowed inside cluster.Get [3] (l.File is non-nil when invoked from roachtest and one of the lines contains an error message).

teardown: 17:59:35 cluster.go:1118: failed to fetch logs: cluster.Get: get logs failed

Thus, it's technically possible that some of the logs may have been truncated. However, it's highly unlikely that both n1's and n3's cockroach.log got truncated. According to journalctl, both nodes exit with 1, at 17:59:31 and 17:59:32,

Sep 08 17:59:31 teamcity-6383257-1662614354-100-n5cpu16-0003 systemd[1]: cockroach.service: Main process exited, code=exited, status=1/FAILURE
Sep 08 17:59:32 teamcity-6383257-1662614354-100-n5cpu16-0001 systemd[1]: cockroach.service: Main process exited, code=exited, status=1/FAILURE

Note that neither process was killed yet there is no trace of any panic in the logs. It appears that both nodes exited with UnspecifiedError. Oddly, the message "Failed running %q\n" [4] is not in any of the logs. These are the last few messages in cockroach.log,

tail -5 logs/1.unredacted/cockroach.log

I220908 17:59:30.787980 49890 upgrade/upgradecluster/cluster.go:118 ⋮ [n1,client=35.196.70.170:33426,user=root,migration-mgr] 1142  executing bump-cluster-version=1000022.1-48 on nodes n{1,2,3,4}
I220908 17:59:30.875387 573820 sql/gcjob/gc_job_utils.go:58 ⋮ [n1,job=794917503914573825] 1143  marked index 3 as GC'd
I220908 17:59:30.881949 573820 sql/gcjob/gc_job_utils.go:289 ⋮ [n1,job=794917503914573825] 1144  updated progress payload: ‹indexes:<index_id:3 status:CLEARED > ranges_unsplit_done:true›
I220908 17:59:30.886290 573820 sql/gcjob/gc_job_utils.go:296 ⋮ [n1,job=794917503914573825] 1145  updated running status: ‹waiting for GC TTL›
I220908 17:59:30.889058 573820 jobs/registry.go:1205 ⋮ [n1] 1146  SCHEMA CHANGE GC job 794917503914573825: stepping through state succeeded with error: <nil>
tail -5 logs/3.unredacted/cockroach.log

I220908 17:59:30.765627 716258 server/migration.go:149 ⋮ [n3,bump-cluster-version] 730  active cluster version setting is now ‹1000022.1-45(fence)› (up from ‹1000022.1-44›)
I220908 17:59:30.770121 716191 server/migration.go:149 ⋮ [n3,bump-cluster-version] 731  active cluster version setting is now ‹1000022.1-46› (up from ‹1000022.1-45(fence)›)
I220908 17:59:30.780334 716309 server/migration.go:149 ⋮ [n3,bump-cluster-version] 732  active cluster version setting is now ‹1000022.1-47(fence)› (up from ‹1000022.1-46›)
I220908 17:59:31.047900 44899 jobs/wait.go:152 ⋮ [n3,intExec=‹set-version›,migration-mgr] 733  waited for 1 [794916516998709249] queued jobs to complete 4m44.045664367s
I220908 17:59:31.049316 44899 upgrade/upgradecluster/cluster.go:118 ⋮ [n3,intExec=‹set-version›,migration-mgr] 734  executing bump-cluster-version=1000022.1-17(fence) on nodes n{1,2,3,4}

[1] https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/tests/tpcc.go#L431 [2] https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/tests/mixed_version_jobs.go#L73 [3] https://github.com/cockroachdb/cockroach/blob/master/pkg/roachprod/install/cluster_synced.go#L2007 [4] https://github.com/cockroachdb/cockroach/blob/master/pkg/cli/cli.go#L73

srosenberg avatar Sep 15 '22 02:09 srosenberg

Examining both system and application metrics, nothing looks anomalous. All nodes have ample system resources. Below graphs corroborate that both n1 and n3 terminate at 17:59:31 while the other two nodes continue to execute,

tpcc_mixed_workload_fails_network

srosenberg avatar Sep 15 '22 02:09 srosenberg

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 726cf22b9f06b766d857b4617dec0df18d1e5cd0:

		  |   283.0s        0          203.1           95.9     21.0     29.4     37.7     44.0 payment
		  |   283.0s        0           19.0            9.6     29.4     39.8     46.1     46.1 stockLevel
		  |   284.0s        0           16.0            9.4     65.0     75.5    100.7    100.7 delivery
		  |   284.0s        0          182.0           89.4     32.5     46.1     52.4     62.9 newOrder
		  |   284.0s        0           11.0            9.6      6.0      6.8      8.4      8.4 orderStatus
		  |   284.0s        0          197.0           96.2     19.9     27.3     32.5     35.7 payment
		  |   284.0s        0           19.0            9.6     29.4     50.3     56.6     56.6 stockLevel
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |   285.0s        0           15.0            9.4     60.8     71.3     75.5     75.5 delivery
		  |   285.0s        0          205.8           89.8     32.5     41.9     44.0     48.2 newOrder
		  |   285.0s        0           16.0            9.6      6.3     11.5     11.5     11.5 orderStatus
		  |   285.0s        0          197.8           96.6     21.0     31.5     46.1     46.1 payment
		  |   285.0s        0           16.0            9.7     23.1     35.7     52.4     52.4 stockLevel
		  |   286.0s        0           15.0            9.4     67.1     79.7     83.9     83.9 delivery
		  |   286.0s        0          183.1           90.1     31.5     39.8     50.3     54.5 newOrder
		  |   286.0s        0           15.0            9.6      7.3      8.9     10.5     10.5 orderStatus
		  |   286.0s        0          178.1           96.8     21.0     28.3     35.7     56.6 payment
		  |   286.0s        0           16.0            9.7     30.4     41.9     46.1     46.1 stockLevel
		  |   287.0s        0           16.0            9.4     56.6     62.9     65.0     65.0 delivery
		  |   287.0s        0          192.1           90.5     32.5     41.9     48.2     52.4 newOrder
		  |   287.0s        0           18.0            9.6      6.8      8.9     10.5     10.5 orderStatus
		  |   287.0s        0          189.1           97.2     21.0     28.3     31.5     32.5 payment
		  |   287.0s        0           18.0            9.7     25.2     46.1     50.3     50.3 stockLevel
		  |   288.0s        0           18.0            9.5     62.9     71.3     75.5     75.5 delivery
		  |   288.0s        0          193.0           90.8     33.6     44.0     56.6     62.9 newOrder
		  |   288.0s        0           20.0            9.7      6.6      8.9     10.0     10.0 orderStatus
		  |   288.0s        0          186.0           97.5     21.0     29.4     32.5     39.8 payment
		  |   288.0s        0           23.0            9.8     23.1     39.8     46.1     46.1 stockLevel
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |   289.0s        0           16.0            9.5     65.0     83.9    104.9    104.9 delivery
		  |   289.0s        0          174.0           91.1     35.7     60.8     83.9     96.5 newOrder
		  |   289.0s        0           15.0            9.7      6.6      9.4     10.5     10.5 orderStatus
		  |   289.0s        0          174.0           97.7     21.0     41.9     67.1     71.3 payment
		  |   289.0s        0           18.0            9.8     23.1     35.7     52.4     52.4 stockLevel
		  |   290.0s        0           11.0            9.5     60.8     67.1     71.3     71.3 delivery
		  |   290.0s        0          182.9           91.4     33.6     46.1     50.3     71.3 newOrder
		  |   290.0s        0           19.0            9.7      6.8      9.4     10.0     10.0 orderStatus
		  |   290.0s        0          195.9           98.1     21.0     29.4     32.5     35.7 payment
		  |   290.0s        0           17.0            9.8     26.2     37.7     44.0     44.0 stockLevel
		Wraps: (8) secondary error attachment
		  | UNCLASSIFIED_PROBLEM: context canceled
		  | (1) UNCLASSIFIED_PROBLEM
		  | Wraps: (2) Node 5. Command with error:
		  |   | ``````
		  |   | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4}
		  |   | ``````
		  | Wraps: (3) context canceled
		  | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
		Wraps: (9) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) *secondary.withSecondaryError (9) *errors.errorString

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=true , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

  • #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-disaster-recovery branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity avatar Sep 22 '22 17:09 cockroach-teamcity

cc @cockroachdb/test-eng

blathers-crl[bot] avatar Sep 26 '22 20:09 blathers-crl[bot]

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ a0bfa6dafcc206301d3a21887c374db63b377075:

		  |    65.0s        0           21.0           18.4     52.4     60.8     71.3     71.3 delivery
		  |    65.0s        0          196.1          194.2     26.2     32.5     39.8     46.1 newOrder
		  |    65.0s        0           11.0           18.1      7.6      8.9      9.4      9.4 orderStatus
		  |    65.0s        0          195.1          187.9     14.7     19.9     23.1     31.5 payment
		  |    65.0s        0           20.0           18.3     18.9     27.3     39.8     39.8 stockLevel
		  |    66.0s        0           19.0           18.5     54.5     88.1     88.1     88.1 delivery
		  |    66.0s        0          192.9          194.2     27.3     39.8     48.2     56.6 newOrder
		  |    66.0s        0           15.0           18.0      6.3      7.6      8.4      8.4 orderStatus
		  |    66.0s        0          197.9          188.0     16.3     23.1     25.2     32.5 payment
		  |    66.0s        0           16.0           18.3     18.9     39.8     48.2     48.2 stockLevel
		  |    67.0s        0           16.0           18.4     52.4     58.7     60.8     60.8 delivery
		  |    67.0s        0          196.6          194.3     27.3     37.7     46.1     52.4 newOrder
		  |    67.0s        0           19.0           18.0      5.8      7.3      7.6      7.6 orderStatus
		  |    67.0s        0          184.7          188.0     15.7     23.1     25.2     30.4 payment
		  |    67.0s        0           22.0           18.3     15.2     29.4     48.2     48.2 stockLevel
		  |    68.0s        0           16.0           18.4     54.5     65.0     65.0     65.0 delivery
		  |    68.0s        0          188.5          194.2     25.2     35.7     41.9     44.0 newOrder
		  |    68.0s        0           20.0           18.1      5.8      8.9      8.9      8.9 orderStatus
		  |    68.0s        0          191.5          188.0     14.7     21.0     27.3     31.5 payment
		  |    68.0s        0           15.0           18.3     18.9     27.3     41.9     41.9 stockLevel
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |    69.0s        0           14.0           18.3     50.3     58.7     62.9     62.9 delivery
		  |    69.0s        0          213.7          194.5     26.2     37.7     52.4     54.5 newOrder
		  |    69.0s        0           19.0           18.1      5.5      7.6      7.9      7.9 orderStatus
		  |    69.0s        0          163.7          187.7     15.7     26.2     32.5     35.7 payment
		  |    69.0s        0           22.0           18.3     24.1     54.5     58.7     58.7 stockLevel
		  |    70.0s        0           17.0           18.3     52.4     65.0     65.0     65.0 delivery
		  |    70.0s        0          185.0          194.3     27.3     39.8     41.9     46.1 newOrder
		  |    70.0s        0           15.0           18.0      6.0      7.1      7.3      7.3 orderStatus
		  |    70.0s        0          164.0          187.3     15.7     24.1     29.4     32.5 payment
		  |    70.0s        0           18.0           18.3     16.3     31.5     46.1     46.1 stockLevel
		  |    71.0s        0           11.0           18.2     56.6     67.1     67.1     67.1 delivery
		  |    71.0s        0          218.3          194.7     29.4     56.6     79.7     79.7 newOrder
		  |    71.0s        0           22.0           18.1      6.6     12.1     14.2     14.2 orderStatus
		  |    71.0s        0          199.2          187.5     17.8     32.5     46.1     65.0 payment
		  |    71.0s        0           13.0           18.3     22.0     41.9     48.2     48.2 stockLevel
		  |    72.0s        0            7.0           18.0     56.6     67.1     67.1     67.1 delivery
		  |    72.0s        0          111.0          193.5     28.3     37.7     41.9     44.0 newOrder
		  |    72.0s        0            7.0           17.9      6.6      7.3      7.3      7.3 orderStatus
		  |    72.0s        0           93.0          186.2     17.8     23.1     28.3     32.5 payment
		  |    72.0s        0            6.0           18.1     19.9     52.4     52.4     52.4 stockLevel
		Wraps: (8) COMMAND_PROBLEM
		Wraps: (9) Node 5. Command with error:
		  | ``````
		  | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4}
		  | ``````
		Wraps: (10) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError

	versionupgrade.go:530,versionupgrade.go:197,tpcc.go:432,test_runner.go:928: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

  • #88668 roachtest: tpcc/mixed-headroom/n5cpu16 failed [C-test-failure O-roachtest O-robot blocks-22.2.0-beta.2 branch-release-22.2 release-blocker]
  • #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-disaster-recovery branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity avatar Sep 27 '22 15:09 cockroach-teamcity

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 84384b50c023dd4c05fff76af85a6975f5d2b0ab:

		  |   252.0s        0          162.0           79.1     25.2     35.7     46.1     54.5 newOrder
		  |   252.0s        0           19.0            8.8      7.3      8.4      8.9      8.9 orderStatus
		  |   252.0s        0          157.0           86.1     13.6     26.2     30.4     32.5 payment
		  |   252.0s        0           13.0            8.8     16.3     25.2     31.5     31.5 stockLevel
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |   253.0s        0           10.0            8.5     54.5     83.9     83.9     83.9 delivery
		  |   253.0s        0          160.1           79.4     23.1     28.3     35.7     41.9 newOrder
		  |   253.0s        0           23.0            8.8      5.8      8.9     15.7     15.7 orderStatus
		  |   253.0s        0          143.1           86.3     13.1     24.1     37.7     39.8 payment
		  |   253.0s        0           12.0            8.8     15.7     31.5     31.5     31.5 stockLevel
		  |   254.0s        0           15.0            8.5     54.5     67.1     75.5     75.5 delivery
		  |   254.0s        0          139.0           79.6     25.2     35.7     46.1     75.5 newOrder
		  |   254.0s        0           10.0            8.9      6.8      8.9      8.9      8.9 orderStatus
		  |   254.0s        0          173.0           86.6     13.6     22.0     31.5     48.2 payment
		  |   254.0s        0           15.0            8.8     22.0     31.5     50.3     50.3 stockLevel
		  |   255.0s        0            7.0            8.5     54.5     56.6     56.6     56.6 delivery
		  |   255.0s        0          156.0           79.9     25.2     33.6     39.8     50.3 newOrder
		  |   255.0s        0           14.0            8.9      6.3      8.4     10.5     10.5 orderStatus
		  |   255.0s        0          181.0           87.0     13.6     25.2     28.3     33.6 payment
		  |   255.0s        0           24.0            8.9     13.1     28.3     31.5     31.5 stockLevel
		  |   256.0s        0           10.0            8.5     50.3    113.2    113.2    113.2 delivery
		  |   256.0s        0          140.0           80.2     25.2     44.0     48.2     54.5 newOrder
		  |   256.0s        0           12.0            8.9      6.0      7.9      8.9      8.9 orderStatus
		  |   256.0s        0          191.9           87.4     14.2     35.7     46.1     48.2 payment
		  |   256.0s        0           13.0            8.9     17.8     27.3     27.3     27.3 stockLevel
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |   257.0s        0           12.0            8.5     56.6     71.3     75.5     75.5 delivery
		  |   257.0s        0          184.1           80.6     26.2     32.5     39.8     41.9 newOrder
		  |   257.0s        0            9.0            8.9      6.8      8.9      8.9      8.9 orderStatus
		  |   257.0s        0          152.1           87.7     14.7     22.0     29.4     37.7 payment
		  |   257.0s        0           13.0            8.9     19.9     27.3     39.8     39.8 stockLevel
		  |   258.0s        0           24.0            8.6     56.6     71.3     83.9     83.9 delivery
		  |   258.0s        0          175.8           81.0     25.2     37.7     46.1     62.9 newOrder
		  |   258.0s        0           19.0            8.9      7.9     11.0     11.0     11.0 orderStatus
		  |   258.0s        0          165.8           88.0     14.2     23.1     39.8     41.9 payment
		  |   258.0s        0           15.0            8.9     18.9     24.1     41.9     41.9 stockLevel
		  |   259.0s        0           12.0            8.6     54.5     62.9     79.7     79.7 delivery
		  |   259.0s        0          137.0           81.2     25.2     33.6     41.9     46.1 newOrder
		  |   259.0s        0           17.0            9.0      7.9     10.5     10.5     10.5 orderStatus
		  |   259.0s        0          156.0           88.2     13.6     19.9     24.1     37.7 payment
		  |   259.0s        0           17.0            8.9     18.9     28.3     39.8     39.8 stockLevel
		Wraps: (8) COMMAND_PROBLEM
		Wraps: (9) Node 5. Command with error:
		  | ``````
		  | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4}
		  | ``````
		Wraps: (10) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError

	versionupgrade.go:530,versionupgrade.go:197,tpcc.go:432,test_runner.go:928: pq: query execution canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=true , ROACHTEST_fs=zfs , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

  • #88668 roachtest: tpcc/mixed-headroom/n5cpu16 failed [C-test-failure O-roachtest O-robot branch-release-22.2]
  • #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-disaster-recovery branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity avatar Oct 03 '22 16:10 cockroach-teamcity

Latest failure has the same failure mode,

Oct 03 15:59:13 teamcity-6749797-1664774404-105-n5cpu16-0003 systemd[1]: cockroach.service: Main process exited, code=exited, status=1/FAILURE

Ongoing internal investigation: https://cockroachlabs.slack.com/archives/C01CDD4HRC5/p1664819770906019?thread_ts=1664295784.890119&cid=C01CDD4HRC5

srosenberg avatar Oct 03 '22 18:10 srosenberg

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ e06d2286b011096526eda7f2d7f7bb7acea0ae84:

test artifacts and logs in: /artifacts/tpcc/mixed-headroom/n5cpu16/run_1
(versionupgrade.go:533).setClusterSettingVersionStep: pq: rpc error: code = Unavailable desc = error reading from server: read tcp 10.142.1.113:47054->10.142.1.79:26257: read: connection reset by peer
(monitor.go:127).Wait: monitor failure: monitor task failed: output in run_144818.024420287_n5_cockroach_workload_run_tpcc: ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4} returned: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=true , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

  • #88668 roachtest: tpcc/mixed-headroom/n5cpu16 failed [C-test-failure O-roachtest O-robot branch-release-22.2]
  • #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-disaster-recovery branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity avatar Oct 08 '22 14:10 cockroach-teamcity

Yet another example of a node doing exit 1 without any stack trace.

In test.log,

14:48:18 tpcc.go:254: test worker status: running tpcc worker=0 warehouses=909 ramp=5m0s duration=2h0m0s on {pgurl:1-4} (<1m0s)

In journalctl.txt,

2.journalctl.txt:Oct 08 14:52:40 teamcity-6837129-1665206365-100-n5cpu16-0002 systemd[1]: cockroach.service: Main process exited, code=exited, status=1/FAILURE

In cockroach-pebble, the last upgraded format version is 008,

I221008 14:48:10.546483 46770 3@pebble/event.go:645 ⋮ [n2,pebble,s2] 5555  upgraded to format version: ‹008›

srosenberg avatar Oct 10 '22 23:10 srosenberg

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 7be0b20edbc336200c1510a9c6f1d76ae2f92c3a:

test artifacts and logs in: /artifacts/tpcc/mixed-headroom/n5cpu16/run_1
(monitor.go:127).Wait: monitor failure: monitor task failed: output in run_142544.008795915_n1_v2216cockroach_workload_fixtures_import_bank: v22.1.6/cockroach workload fixtures import bank --payload-bytes=10240 --rows=32552083 --seed=4 --db=bigbank returned: SSH_PROBLEM: exit status 255
(test_runner.go:1062).teardownTest: test timed out (0s)

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_fs=zfs , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

  • #89755 roachtest: tpcc/mixed-headroom/n5cpu16 failed [C-test-failure O-roachtest O-robot T-testeng branch-release-22.2.0 release-blocker]
  • #88668 roachtest: tpcc/mixed-headroom/n5cpu16 failed [C-test-failure O-roachtest O-robot branch-release-22.2]
  • #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-disaster-recovery branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity avatar Oct 16 '22 00:10 cockroach-teamcity

Last failure is an entirely different failure mode. The bank import step appears to run for hours until it's killed due to test time out.

The preceding step to import tpcc completes @14:25,

I221015 14:25:41.948258 1 ccl/workloadccl/fixture.go:326  [-] 11  imported 62 GiB bytes in 9 tables (took 4m59.344260633s, 212.78 MiB/s)

The bank import starts immediately after,

run_142544.008795915_n1_v2216cockroach_workload_fixtures_import_bank: 14:25:44 cluster.go:291: > v22.1.6/cockroach workload fixtures import bank --payload-bytes=10240 --rows=32552083 --seed=4 --db=bigbank
I221015 14:25:44.936592 1 ccl/workloadccl/fixture.go:318  [-] 1  starting import of 1 tables

All nodes appear to be live for the remaining ~10 hours,

cpu_ram network

srosenberg avatar Oct 17 '22 02:10 srosenberg

@stevendanna Would you mind taking a look at the logs to see what could possible have caused the import to run for ~10 hours. The last warning message concerning the import is @15:35,

logs/3.unredacted/cockroach.log:W221015 15:35:13.557144 86844 kv/bulk/sst_batcher.go:469 ⋮ [n3,f‹d1df2c12›,job=805350752177750017] 25254  ‹bank rows› failed to scatter	: existing range size 10496962 exceeds specified limit 4194304

On n2 we see these warnings every minute, starting @15:08, ~6 minutes after the split is initiated,

logs/2.unredacted/cockroach.log:I221015 15:02:47.223326 373620 kv/kvserver/pkg/kv/kvserver/replica_command.go:420 ⋮ [n2,s2,r6742/1:‹/Table/181/1/{284963…-325520…}›] 18370  initiating a split of this range at key ‹/Table/181/1/28503022› [r6746] (‹manual›)‹›
logs/2.unredacted/cockroach.log:I221015 15:02:47.346852 373677 kv/kvserver/pkg/kv/kvserver/replica_command.go:2260 ⋮ [n2,s2,r6746/1:‹/Table/181/1/{285030…-325520…}›] 18375  change replicas (add [(n4,s4):4LEARNER] remove []): existing descriptor r6746:‹/Table/181/1/{28503022-32552000}› [(n2,s2):1, (n1,s1):2, (n3,s3):3, next=4, gen=3665, sticky=1665846767.222826180,0]
logs/2.unredacted/cockroach.log:W221015 15:08:49.363284 680702 kv/kvserver/pkg/kv/kvserver/merge_queue.go:411 ⋮ [n2,merge,s2,r6746/1:‹/Table/181/1/{285030…-325520…}›] 19925  ‹kv/kvserver/pkg/kv/kvserver/replica_command.go›:810: merge failed: fetching current range descriptor value: context deadline exceeded
logs/2.unredacted/cockroach.log:W221015 15:09:49.364832 741327 kv/kvclient/kvcoord/dist_sender.go:1602 ⋮ [n2,merge,s2,r6746/1:‹/Table/181/1/{285030…-325520…}›] 20262  slow range RPC: have been waiting 60.00s (1 attempts) for RPC Get [‹/Local/Range/Table/181/1/28503022/RangeDescriptor›,‹/Min›), [txn: c5a14092], [can-forward-ts] to r6746:‹/Table/181/1/{28503022-32552000}› [(n2,s2):1, (n1,s1):2, (n3,s3):3, next=4, gen=3665, sticky=1665846767.222826180,0]; resp: ‹(err: context deadline exceeded: "merge" meta={id=c5a14092 key=/Local/Range/Table/181/1/28503022/RangeDescriptor pri=0.00562966 epo=0 ts=1665846529.364048180,0 min=1665846529.364048180,0 seq=0} lock=true stat=PENDING rts=1665846529.364048180,0 wto=false gul=1665846529.864048180,0)›

and persisting until the time out @00:19:50,

W221016 00:19:50.341423 741327 kv/kvserver/pkg/kv/kvserver/merge_queue.go:411 ⋮ [n2,merge,s2,r6746/1:‹/Table/181/1/{285030…-325520…}›] 28905  ‹kv/kvserver/pkg/kv/kvserver/replica_command.go›:810: merge failed: fetching current range descriptor value: context deadline exceeded

srosenberg avatar Oct 17 '22 02:10 srosenberg