Stan Rosenberg
Stan Rosenberg
[bcov.log.gz](https://github.com/abenkhadra/bcov/files/7840244/bcov.log.gz)
> Detecting the SSH flake via substring still works and keeps this PR simple. A chance of a false positive (via substring) is probably very low, but structured errors should...
I recently spotted another issue... seeing this error message quite often during cluster teardown, ``` teardown: 16:53:25 cluster.go:1149: failed to fetch logs: cluster.Get: get logs failed ``` The above error...
The most recent failure seems unrelated to the other mixed versions failures, namely `version/mixed/nodes=3` and `version/mixed/nodes=5`. (Both failed because of the recent change requiring `COCKROACH_UPGRADE_TO_DEV_VERSION` [1].) Also, this failure doesn't...
From `teardown.log`, we can see that the background tpcc workload fails after ~5 minutes, ``` I220908 17:54:41.085738 1 workload/cli/run.go:427 [-] 1 creating load generator... I220908 17:54:41.282881 1 workload/cli/run.go:458 [-] 2...
Examining both system and application metrics, nothing looks anomalous. All nodes have ample system resources. Below graphs corroborate that both `n1` and `n3` terminate at `17:59:31` while the other two...
Latest failure has the same failure mode, ``` Oct 03 15:59:13 teamcity-6749797-1664774404-105-n5cpu16-0003 systemd[1]: cockroach.service: Main process exited, code=exited, status=1/FAILURE ``` Ongoing internal investigation: https://cockroachlabs.slack.com/archives/C01CDD4HRC5/p1664819770906019?thread_ts=1664295784.890119&cid=C01CDD4HRC5
Yet another example of a node doing `exit 1` without any stack trace. In `test.log`, ``` 14:48:18 tpcc.go:254: test worker status: running tpcc worker=0 warehouses=909 ramp=5m0s duration=2h0m0s on {pgurl:1-4} (
Last failure is an entirely different failure mode. The [bank import step](https://github.com/cockroachdb/cockroach/blob/21365bf129f9eb618fcf23999129d82794d76f88/pkg/cmd/roachtest/tests/tpcc.go#L416) appears to run for hours until it's killed due to test time out. The preceding step to import...
@stevendanna Would you mind taking a look at the logs to see what could possible have caused the import to run for ~10 hours. The last warning message concerning the...