cockroach
cockroach copied to clipboard
serverccl: package times out during shutdown causing flakes
Describe the problem
The serverccl package has been timing out.
One theory:
Seems like the server fails to shut down because we’re waiting for quiescence but the contexts are not getting canceled correctly or something and we’re in an infinite retry inside
kv/kvclient/rangecache.(*RangeCache).tryLookup.
Using a bisect I landed on 262a70d506e0b1f14ac1ba4ab831885c26bcd901 as the first bad commit, but I don't see why.
To Reproduce
The TestNoInflightTracesVirtualTableOnTenant test reproes it.
./dev test pkg/ccl/serverccl --stress --filter=TestNoInflightTracesVirtualTableOnTenant --timeout=2m --test-args='-test.timeout 20s'
Jira issue: CRDB-18494
The below stack trace is telling.
* goroutine 53508 [select]:
* github.com/cockroachdb/cockroach/pkg/util/retry.(*Retry).Next(0xc001a4fe10)
* github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:127 +0x13e
* github.com/cockroachdb/cockroach/pkg/sql/catalog/schematelemetry/schematelemetrycontroller.updateSchedule({0x5eb3ed8, 0xc010228d20}, 0xc6567c?, {0x5efb720, 0xc00d7db6e0}, 0xc010634000)
* github.com/cockroachdb/cockroach/pkg/sql/catalog/schematelemetry/schematelemetrycontroller/pkg/sql/catalog/schematelemetry/schematelemetrycontroller/controller.go:149 +0x266
https://github.com/cockroachdb/cockroach/pull/85945 seems to have fixed it