cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

roachtest: multi-region/mixed-version failed

Open cockroach-teamcity opened this issue 1 year ago • 2 comments
trafficstars

roachtest.multi-region/mixed-version failed with artifacts on master @ 2a5e231716c436781f12452d800651f51c6383b7:

(test_runner.go:1185).runTest: test timed out (36h0m0s)
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=true
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

/cc @cockroachdb/test-eng

This test on roachdash | Improve this report!

Jira issue: CRDB-37272

cockroach-teamcity avatar Apr 01 '24 14:04 cockroach-teamcity

The test timed out, but it's unfortunately not possible to know if it got stuck or if this was a regular execution and the test was too large:

Failed to publish artifacts: Artifact file 'artifacts.zip' has size 8.5 GB which exceeds maximum allowed size of 3 GB. Maximum artifact size is configured at the Administration -> Global Settings page.

We should find ways to make this test shorter.

renatolabs avatar Apr 01 '24 18:04 renatolabs

roachtest.multi-region/mixed-version failed with artifacts on master @ 047a7ed79756eef53b8b9ab4c9dd9c5a463496c9:

(cluster.go:2417).Run: full command output in run_140450.372345979_n81_tpcc-workload-check-.log: COMMAND_PROBLEM: exit status 127
(mixedversion.go:620).Run: panic (stack trace above): t.Fatal() was called
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=true
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Jun 29 '24 14:06 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ 1893ac1d7188dc87dbb2b379b30172bf83ca4645:

(node_lister.go:68).WorkloadNode: workload node specified but no workload nodes were provisioned by the cluster
(mixedversion.go:646).Run: panic (stack trace above): t.Fatal() was called
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Jul 27 '24 10:07 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ 27468a22f82fb1e500bf7b35a62ca58d80abac39:

(cluster.go:2473).Run: full command output in run_150237.261300107_n81_cockroach-workload-i.log: COMMAND_PROBLEM: exit status 1
(mixedversion.go:694).Run: panic (stack trace above): t.Fatal() was called
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=true
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Aug 10 '24 15:08 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ 1993fc04b5116f20f4814d637c7ce87b003687e4:

(mixedversion.go:695).Run: mixed-version test failure while running step 14 (restart node 22 with binary version master): cluster.StopE: timed out after 300s waiting for n22 to drain and shutdown
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=true
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Aug 17 '24 13:08 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ 5cc8013ada42f9ea03eda661a4f178c141f4f24d:

(mixedversion.go:710).Run: mixed-version test failure while running step 3 (start shared-process tenant "mixed-version-tenant-yn2yj"): ~ COCKROACH_INTERNAL_DISABLE_METAMORPHIC_TESTING=true COCKROACH_CONNECT_TIMEOUT=1200 ./cockroach sql --url 'postgres://root@localhost:26257?options=-ccluster%3Dmixed-version-tenant-yn2yj&sslcert=.%2Fcerts%2Fclient.root.crt&sslkey=.%2Fcerts%2Fclient.root.key&sslmode=verify-full&sslrootcert=.%2Fcerts%2Fca.crt' -e "CREATE SCHEDULE IF NOT EXISTS test_only_backup FOR BACKUP INTO 'gs://cockroachdb-backup-testing/roachprod-scheduled-backups/teamcity-16932716-1726621206-57-n81cpu4-geo/mixed-version-tenant-yn2yj/1726650565447818345?AUTH=implicit' RECURRING '*/15 * * * *' FULL BACKUP '@hourly' WITH SCHEDULE OPTIONS first_run = 'now'"
ERROR: service unavailable for target tenant (mixed-version-tenant-yn2yj)
SQLSTATE: 08000
HINT: Double check your "-ccluster=" connection option or your "cluster:" database name prefix.
Failed running "sql": COMMAND_PROBLEM: exit status 1 [owner=test-eng]
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=true
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Sep 18 '24 09:09 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ 7495c0390f93478bc0f961285f41cd4ec85a6a76:

(test_runner.go:1316).runTest: test timed out (36h0m0s)
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Oct 27 '24 22:10 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ 17535c13cfed95db70cd8dfb1ba6a700686f57b1:

(test_runner.go:1316).runTest: test timed out (36h0m0s)
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=true
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Nov 03 '24 22:11 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ e83bc46aa42f2476b4b11b9703b8038c660dc980:

(mixedversion.go:759).Run: mixed-version test failure while running step 436 (wait for all nodes (:1-80) to acknowledge cluster version '24.2' on system tenant): timed out after 1h0m0s: expected n1 to be at cluster version 24.2, but is still at 24.1-upgrading-to-24.2-step-010: pq: query execution canceled
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • mvtDeploymentMode=shared-process
  • mvtVersions=v23.1.28 → v23.2.15 → v24.1.6 → v24.2.4 → master
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Nov 17 '24 05:11 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ cea3ff5562160a3bf2802da052da2aaa40e1ccc1:

(cluster.go:2456).Run: full command output in run_165321.985907466_n81_cockroach-workload-i.log: COMMAND_PROBLEM: exit status 1
(mixedversion.go:759).Run: panic (stack trace above): t.Fatal() was called
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • mvtDeploymentMode=separate-process
  • mvtVersions=v24.1.6 → v24.2.4 → master
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Nov 24 '24 11:11 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ bcc993d796d03664604bf695e38fd5644d0bc952:

(test_runner.go:1474).func1: failed during post test assertions (see test-post-assertions.log): dial tcp 35.196.184.127:26257: connect: connection refused
(test_runner.go:1474).func1: failed during post test assertions (see test-post-assertions.log): dial tcp 35.196.184.127:26257: connect: connection refused
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=true
  • fs=ext4
  • localSSD=true
  • mvtDeploymentMode=system-only
  • mvtVersions=v24.1.7 → v24.2.5 → v24.3.0-rc.1 → master
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Dec 01 '24 01:12 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ 9744e5f1676a752d5b200fe7bce84ca8b44afca0:

(test_runner.go:1482).func1: failed during post test assertions (see test-post-assertions.log): dial tcp 34.139.107.46:26257: connect: connection refused
(test_runner.go:1482).func1: failed during post test assertions (see test-post-assertions.log): dial tcp 34.139.107.46:26257: connect: connection refused
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • mvtDeploymentMode=shared-process
  • mvtVersions=v24.1.7 → v24.2.5 → v24.3.0 → master
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Dec 08 '24 11:12 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ 49cff91f3501494deaf038671bc643c194a0e3ca:

(mixedversion.go:766).Run: mixed-version test failure while running step 89 (wait for all nodes (:1-80) to acknowledge cluster version '24.1' on system tenant): timed out after 1h0m0s: expected n1 to be at cluster version 24.1, but is still at 23.2-upgrading-to-24.1-step-024: not upgraded yet
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=true
  • fs=ext4
  • localSSD=true
  • mvtDeploymentMode=system-only
  • mvtVersions=v23.2.16 → v24.1.7 → v24.2.5 → v24.3.0 → master
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Dec 14 '24 20:12 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ 0806ee44e36b62dce75175702b5bfb3db03e0577:

(cluster.go:2481).Run: full command output in run_114518.486116727_n81_cockroach-workload-i.log: COMMAND_PROBLEM: exit status 1
(mixedversion.go:783).Run: panic (stack trace above): t.Fatal() was called
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • mvtDeploymentMode=separate-process
  • mvtVersions=v24.3.1 → master
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Dec 21 '24 23:12 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ 686a8fa2cbd2193542949d7d4069725fa36db18d:

(mixedversion.go:783).Run: unexpected node event: n11: cockroach process for system interface died (exit code 137)
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=true
  • fs=ext4
  • localSSD=true
  • mvtDeploymentMode=shared-process
  • mvtVersions=v24.1.9 → v24.3.2 → master
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Dec 28 '24 21:12 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ 3cc42e66a71164bd69195ad3c10ab03607a7bc7e:

(mixedversion.go:804).Run: mixed-version test failure while running step 672 (wait for all nodes (:1-80) to acknowledge cluster version <current> on system tenant): timed out after 1h0m0s: expected n1 to be at cluster version 24.3-upgrading-to-25.1-step-018, but is still at 24.3-upgrading-to-25.1-step-016: not upgraded yet
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • mvtDeploymentMode=system-only
  • mvtVersions=v23.1.30 → v23.2.18 → v24.1.9 → v24.3.2 → master
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Jan 12 '25 13:01 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ f7d019d5e675fa2711b61ffd54a9990b6a1a1da8:

(mixedversion.go:804).Run: mixed-version test failure while running step 3 (start shared-process tenant "mixed-version-tenant-0mti2"): waiting for shared-process tenant on n23: pq: service unavailable for target tenant (mixed-version-tenant-0mti2) [owner=test-eng]
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=true
  • fs=ext4
  • localSSD=true
  • mvtDeploymentMode=shared-process
  • mvtVersions=v24.1.10 → v24.3.3 → master
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Feb 01 '25 10:02 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ bbaa2e50b2fe789527aac09b99fa5eee432e7695:

(mixedversion.go:804).Run: preparing to run step 262: failed to get cluster version for node 22 (mixed-version-tenant-5kkje): pq: query execution canceled
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=true
  • fs=ext4
  • localSSD=true
  • mvtDeploymentMode=separate-process
  • mvtVersions=v24.3.4 → master
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Feb 09 '25 02:02 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ c471a724f99a5add3ccefe07995251760d0e2212:

(cluster.go:2490).Run: full command output in run_114505.112616955_n81_cockroach-workload-i.log: COMMAND_PROBLEM: exit status 1
(mixedversion.go:800).Run: panic (stack trace above): t.Fatal() was called
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=true
  • fs=ext4
  • localSSD=true
  • metamorphicBufferedSender=true
  • mvtDeploymentMode=separate-process
  • mvtVersions=v23.2.20 → v24.1.13 → v24.2.10 → v24.3.6 → master
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Feb 23 '25 10:02 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ 65de681fae64493f30647bd17dc77424e260c242:

(mixedversion.go:800).Run: preparing to run step 4: failed to get binary version for node 4 (mixed-version-tenant-km3y2): pq: internal error while retrieving user account memberships: operation "get-user-session" timed out after 10.001s (given timeout 10s): internal error while retrieving user account: get auth info error: interrupted during singleflight load-value:authinfo-roachprod-2-2: context deadline exceeded [owner=test-eng]
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • metamorphicBufferedSender=true
  • mvtDeploymentMode=separate-process
  • mvtVersions=v23.2.20 → v24.1.13 → v24.2.10 → v24.3.6 → master
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Mar 01 '25 19:03 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ 5c4091ebcdcb2fc4a9b93ca23a9e68b192f9db17:

(cluster.go:2491).Run: full command output in run_082519.550185463_n81_cockroach-workload-i.log: COMMAND_PROBLEM: exit status 1
(mixedversion.go:800).Run: panic (stack trace above): t.Fatal() was called
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • metamorphicBufferedSender=true
  • mvtDeploymentMode=separate-process
  • mvtVersions=v24.3.6 → master
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Mar 09 '25 20:03 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ 589b9ba45190130715ee3ab74d3379a3ff3c81c6:

(mixedversion.go:800).Run: mixed-version test failure while running step 687 (wait for all nodes (:1-80) to acknowledge cluster version <current> on mixed-version-tenant-9788q tenant): timed out after 1h0m0s: expected n1 to be at cluster version 25.1-upgrading-to-25.2-step-004, but is still at 24.3-upgrading-to-25.1-step-016: not upgraded yet
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=true
  • fs=ext4
  • localSSD=true
  • metamorphicBufferedSender=true
  • mvtDeploymentMode=shared-process
  • mvtVersions=v23.2.21 → v24.1.14 → v24.2.10 → v24.3.8 → master
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Mar 17 '25 04:03 cockroach-teamcity

Failure as of the run in [1],

  |   | ./cockroach workload init tpcc --regions=us-east1,us-west1,europe-west2,europe-central2 --partitions=4 {pgurl:1} --db tpcc_background --warehouses 600
  |   | ```
  |   | <truncated> ... took 11h29m6.622186325s, 0.00 MiB/s)
  |   | I250309 19:55:22.986730 29 ccl/workloadccl/fixture.go:514  [-] 6  imported 1.5 GiB in history table (17835000 rows, 0 index entries, took 11h29m23.947804152s, 0.04 MiB/s)
  |   | I250309 19:55:58.419802 30 ccl/workloadccl/fixture.go:514  [-] 7  imported 1.5 GiB in order table (17835000 rows, 18000000 index entries, took 11h29m59.380509962s, 0.04 MiB/s)
  |   | I250309 20:00:08.752521 66 ccl/workloadccl/fixture.go:514  [-] 8  imported 13 GiB in order_line table (179995926 rows, 0 index entries, took 11h34m9.711714441s, 0.31 MiB/s)
  |   | I250309 20:00:19.398220 33 ccl/workloadccl/fixture.go:514  [-] 9  imported 19 GiB in stock table (60000000 rows, 0 index entries, took 11h34m20.357654061s, 0.46 MiB/s)
  |   | Error: importing fixture: importing table new_order: pq: addsstable [/Tenant/3/Table/113/1/" "/235/1/2101/0,/Tenant/3/Table/113/1/" "/239/10/3000/0/NULL): batch timestamp 1741510349.404492120,0 must be after replica GC threshold 1741513807.930912055,0 (r614: /Tenant/3/Table/113/1/" "{-/PrefixEnd})

The import kicks off at 08:25. It imports item table by 08:26, so far so good. Then, the district table is not imported until 19:55, that's like 11.5h!

run_082519.550185463_n81_cockroach-workload-i: 2025/03/09 08:25:19 cluster.go:2515: > ./cockroach workload init tpcc --regions=us-east1,us-west1,europe-west2,europe-central2 --partitions=4 {pgurl:1} --db tpcc_background --warehouses 600
run_082519.550185463_n81_cockroach-workload-i: 2025/03/09 08:25:19 cluster_synced.go:832: Node 81 expanded cmd: ./cockroach workload init tpcc --regions=us-east1,us-west1,europe-west2,europe-central2 --partitions=4 'postgres://roachprod:[email protected]:29000?sslcert=.%2Fcerts%2Fclient.roachprod.crt&sslkey=.%2Fcerts%2Fclient.roachprod.key&sslmode=verify-full&sslrootcert=.%2Fcerts%2Fca.crt' --db tpcc_background --warehouses 600
I250309 08:25:21.460587 1 workload/cli/run.go:656  [-] 1  random seed: 15537425458421579330
I250309 08:25:52.366505 1 ccl/workloadccl/fixture.go:314  [-] 2  starting import of 9 tables
I250309 08:26:07.846918 32 ccl/workloadccl/fixture.go:514  [-] 3  imported 8.9 MiB in item table (100000 rows, 0 index entries, took 8.806879904s, 1.01 MiB/s)
I250309 19:55:05.554071 27 ccl/workloadccl/fixture.go:514  [-] 4  imported 684 KiB in district table (5945 rows, 0 index entries, took 11h29m6.515917079s, 0.00 MiB/s)
I250309 19:55:05.659991 26 ccl/workloadccl/fixture.go:514  [-] 5  imported 40 KiB in warehouse table (595 rows, 0 index entries, took 11h29m6.622186325s, 0.00 MiB/s)
I250309 19:55:22.986730 29 ccl/workloadccl/fixture.go:514  [-] 6  imported 1.5 GiB in history table (17835000 rows, 0 index entries, took 11h29m23.947804152s, 0.04 MiB/s)
I250309 19:55:58.419802 30 ccl/workloadccl/fixture.go:514  [-] 7  imported 1.5 GiB in order table (17835000 rows, 18000000 index entries, took 11h29m59.380509962s, 0.04 MiB/s)
I250309 20:00:08.752521 66 ccl/workloadccl/fixture.go:514  [-] 8  imported 13 GiB in order_line table (179995926 rows, 0 index entries, took 11h34m9.711714441s, 0.31 MiB/s)
I250309 20:00:19.398220 33 ccl/workloadccl/fixture.go:514  [-] 9  imported 19 GiB in stock table (60000000 rows, 0 index entries, took 11h34m20.357654061s, 0.46 MiB/s)
Error: importing fixture: importing table new_order: pq: addsstable [/Tenant/3/Table/113/1/" "/235/1/2101/0,/Tenant/3/Table/113/1/" "/239/10/3000/0/NULL): batch timestamp 1741510349.404492120,0 must be after replica GC threshold 1741513807.930912055,0 (r614: /Tenant/3/Table/113/1/" "{-/PrefixEnd})

At 08:26, on n1,

W250309 08:26:20.686200 20358 kv/kvclient/kvcoord/dist_sender.go:2764 ⋮ [T3,Vmixed-version-tenant-tnxog,nsql1,job=IMPORT id=1053289218348810241] 1173  slow replica RPC: have been waiting 10.06s (0 attempts) for RPC Scan [/Tenant/3/Table/3/1/‹108›/‹2›/‹1›,/Tenant/3/Table/3/1/‹114›/‹2›/‹1›), Get [/Tenant/3/Table/3/1/‹115›/‹2›/‹1›], Scan [/Tenant/3/Table/5/1/‹108›/‹2›/‹1›,/Tenant/3/Table/5/1/‹114›/‹2›/‹1›), Get [/Tenant/3/Table/5/1/‹115›/‹2›/‹1›], [txn: 32bc5c96] to replica (n9,s9):6; resp: ‹(err: <nil>), *kvpb.ScanResponse, *kvpb.GetResponse, *kvpb.ScanResponse, *kvpb.GetResponse›
I250309 08:28:10.397910 20317 sql/importer/distsql_plan_bulk.go:169 ⋮ [T3,Vmixed-version-tenant-tnxog,nsql1,job=IMPORT id=1053289218340126721] 1217  Re-planning would add or alter flows on 80 nodes / 1.00, threshold 0.00, replan false

We also see latency jump warnings,

W250309 08:30:53.918844 138 2@rpc/clock_offset.go:286 ⋮ [T3,Vmixed-version-tenant-tnxog,nsql1,rnode=38,raddr=‹10.142.1.225:26257›,class=default,rpc] 1314  latency jump (prev avg 107.52ms, current 194.39ms)
W250309 08:31:14.573356 21727 2@rpc/clock_offset.go:286 ⋮ [T3,Vmixed-version-tenant-tnxog,nsql1,rnode=31,raddr=‹10.142.1.192:26257›,class=default,rpc] 1315  latency jump (prev avg 107.57ms, current 433.39ms)

Also on n1, we observe switching to fallback rate, which happens only if token bucket RPC requests don't complete in time,

I250309 08:22:29.237591 593 ccl/multitenantccl/tenantcostclient/tenant_side.go:508 ⋮ [T3,Vmixed-version-tenant-tnxog,nsql1] 118  switching to fallback rate 10133.16799 tokens/s
I250309 08:22:41.237548 593 ccl/multitenantccl/tenantcostclient/tenant_side.go:508 ⋮ [T3,Vmixed-version-tenant-tnxog,nsql1] 126  switching to fallback rate 10179.4629 tokens/s
I250309 08:23:01.237384 593 ccl/multitenantccl/tenantcostclient/tenant_side.go:508 ⋮ [T3,Vmixed-version-tenant-tnxog,nsql1] 143  switching to fallback rate 10232.15437 tokens/s
I250309 08:23:26.236889 593 ccl/multitenantccl/tenantcostclient/tenant_side.go:508 ⋮ [T3,Vmixed-version-tenant-tnxog,nsql1] 162  switching to fallback rate 10298.84633 tokens/s
I250309 08:47:17.236802 593 ccl/multitenantccl/tenantcostclient/tenant_side.go:508 ⋮ [T3,Vmixed-version-tenant-tnxog,nsql1] 1815  switching to fallback rate +Inf tokens/s

The message switching to fallback rate +Inf tokens/s is suspicious. It's not immediately clear why the fallback rate would be +Inf. It could also imply that the result (tryAgainAfter) returned by tokenBucket.TryToFulfill could end up being +Inf.

[1] https://github.com/cockroachdb/cockroach/issues/121455#issuecomment-2709049637

srosenberg avatar Mar 24 '25 19:03 srosenberg

The most recent failure [1] timed out after 1h of waiting for the cluster to upgrade, from 24.3 to master,

timed out after 1h0m0s: expected n1 to be at cluster version 25.1-upgrading-to-25.2-step-004, but is still at 24.3-upgrading-to-25.1-step-016

The issue appears to be the job table backfill migration,

I250317 03:11:03.148398 3338000 upgrade/upgrademanager/manager.go:716 ⋮ [T2,Vmixed-version-tenant-9788q,n78,intExec=set-version,migration-mgr] 3832  waiting for Upgrade to 24.3-upgrading-to-25.1-step-018: "backfill new jobs tables"

which can take a long time. After a recent fix [2], progress is being made, albeit at a very slow pace; we see this update just before the timeout.

logs/18.unredacted/cockroach.log:I250317 04:14:40.961598 4566873 upgrade/upgrades/v25_1_add_jobs_tables.go:221 ⋮ [T2,Vmixed-version-tenant-9788q,n18,job=MIGRATION id=1055492209439670290,upgrade=24.3-upgrading-to-25.1-step-018] 4848  backfilled new columns for 3367 of 7987 jobs so far

@dt For reference, this is an 80-node cluster with fairly light tpcc workload in the background. Is it reasonable for the jobs table backfill to take longer than 1h? Granted, I'm not sure why we have so many jobs (7987), and sadly we were unable to grab debug.zip.

[1] https://github.com/cockroachdb/cockroach/issues/121455#issuecomment-2728101093 [2] https://github.com/cockroachdb/cockroach/pull/141420

srosenberg avatar Mar 24 '25 21:03 srosenberg

Is it reasonable for the jobs table backfill to take longer than 1h

In multi-region clusters we expect writes to system tables (rf=5) to be much slower. We pessimized this backfill to process a single row at a time, trading away throughput for less contention and a more reliable expectation that it will, eventually, finish rather than get stuck in infinite retries and stop making progress. As such, we now expect it to process roughly 1 job/second in an MR cluster. if this cluster has 7987 jobs prior to backfill, that would mean we expect it to take just over two hours.

dt avatar Mar 25 '25 13:03 dt

roachtest.multi-region/mixed-version failed with artifacts on master @ 5ced8ed703455399acd29595b8f865cf709045be:

(mixedversion.go:829).Run: mixed-version test failure while running step 173 (wait for all nodes (:1-52) to acknowledge cluster version '25.1' on mixed-version-tenant-lfvp3 tenant): timed out after 2h0m0s: expected n1 to be at cluster version 25.1, but is still at 24.3-upgrading-to-25.1-step-016: not upgraded yet
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=true
  • fs=ext4
  • localSSD=true
  • metamorphicBufferedSender=false
  • mvtDeploymentMode=shared-process
  • mvtVersions=v24.3.8 → v25.1.2 → master
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Mar 30 '25 18:03 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ 6d76d83c6092dc8dc02778fec6b419372c3be69b:

(mixedversion.go:829).Run: preparing to run step 68: failed to get binary version for node 5 (mixed-version-tenant-3j8oz): pq: internal error while retrieving user account memberships: operation "get-user-session" timed out after 10s (given timeout 10s): internal error while retrieving user account: get auth info error: interrupted during singleflight load-value:authinfo-roachprod-2-2: context deadline exceeded
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=true
  • fs=ext4
  • localSSD=true
  • metamorphicBufferedSender=false
  • mvtDeploymentMode=separate-process
  • mvtVersions=v24.1.16 → v24.2.10 → v24.3.10 → v25.1.4 → master
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Apr 19 '25 16:04 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ 7552695eea44c6be9eecdd722b13c10821853a69:

(monitor.go:149).Wait: monitor failure: full command output in run_135447.718826265_n53_cockroach-workload-r.log: COMMAND_PROBLEM: exit status 1
(mixedversion.go:829).Run: panic (stack trace above): t.Fatal() was called
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • metamorphicBufferedSender=false
  • mvtDeploymentMode=system-only
  • mvtVersions=v24.3.10 → v25.1.4 → master
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar Apr 27 '25 14:04 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ 210cac0e4000426d486e98f204864a704e66e253:

(cluster.go:2498).Run: full command output in run_142557.435925667_n53_cockroach-workload-i.log: COMMAND_PROBLEM: exit status 1
(mixedversion.go:829).Run: panic (stack trace above): t.Fatal() was called
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=true
  • fs=ext4
  • localSSD=true
  • metamorphicBufferedSender=false
  • mvtDeploymentMode=separate-process
  • mvtVersions=v24.1.17 → v24.2.10 → v24.3.11 → v25.1.5 → master
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar May 03 '25 14:05 cockroach-teamcity

roachtest.multi-region/mixed-version failed with artifacts on master @ a085c926b3ddeabb029d4ff5135d2165676ae7a2:

(mixedversion.go:829).Run: preparing to run step 69: failed to get binary version for node 8 (mixed-version-tenant-20new): pq: internal error while retrieving user account memberships: operation "get-user-session" timed out after 10s (given timeout 10s): internal error while retrieving user account: get auth info error: interrupted during singleflight load-value:authinfo-roachprod-2-2: context deadline exceeded
test artifacts and logs in: /artifacts/multi-region/mixed-version/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=true
  • fs=ext4
  • localSSD=true
  • mvtDeploymentMode=separate-process
  • mvtVersions=v24.3.12 → v25.1.5 → master
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity avatar May 18 '25 02:05 cockroach-teamcity