cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

roachtest: fix crash on cluster creation failure

Open andreimatei opened this issue 1 year ago • 5 comments

Before this patch, we'd segfault if cluster creation fails and we're running with --debug-always.

This patch is a small protection against the problem, but I don't think it's the right fix. The structure of the code makes similar bugs very likely to introduce -- I've made the same mistake myself. The issue is that, since #73706, the point where the cluster creation fails is disconnected from the point where that error is handled. It's very easy for code in between to assume that there is a cluster. Besides the bug being fixed here, there are some other awkward protections against a nil cluster around that seem surprising.

Epic: None Release note: None

andreimatei avatar Jun 26 '24 19:06 andreimatei

Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

blathers-crl[bot] avatar Jun 26 '24 19:06 blathers-crl[bot]

This change is Reviewable

cockroach-teamcity avatar Jun 26 '24 19:06 cockroach-teamcity

Thank you for updating your pull request.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

blathers-crl[bot] avatar Jun 26 '24 19:06 blathers-crl[bot]

cc @srosenberg @DarrylWong

andreimatei avatar Jun 26 '24 19:06 andreimatei

Before this patch, we'd segfault if cluster creation fails and we're running with --debug-always.

This patch is a small protection against the problem, but I don't think it's the right fix. The structure of the code makes similar bugs very likely to introduce -- I've made the same mistake myself. The issue is that, since #73706, the point where the cluster creation fails is disconnected from the point where that error is handled. It's very easy for code in between to assume that there is a cluster. Besides the bug being fixed here, there are some other awkward protections against a nil cluster around that seem surprising.

Thanks for the fix! Indeed, that code is overdue for restructuring. At some point, I'm still hoping to revisit a previous attempt [1].

[1] https://github.com/cockroachdb/cockroach/pull/101814

srosenberg avatar Jun 27 '24 01:06 srosenberg

bors r=srosenberg,darrylwong,herkolategan

DarrylWong avatar Jul 01 '24 14:07 DarrylWong