cockroach
cockroach copied to clipboard
release-24.1: allocatorimpl: include full store count in allocator error
Backport 1/1 commits from #124073 on behalf of @kvoli.
/cc @cockroachdb/release
An allocator error is returned when a new or replacement replica cannot be allocated to a store. The error details how many live stores there are and how many hold existing replicas along with constraints. The error makes determining the cause of up-replication stalls easier.
Also include the number of alive stores which are ineligible due to full disks in the error. The updated error message when at least one store is full:
0 of N live stores are able to take a new replica for the range (X full disk, ...)
Where N is the number of live stores, of which X have a full disk.
Resolves: #118313 Release note: None
Release justification: Support related issue adding observability without any functionality changes.
Thanks for opening a backport.
Please check the backport criteria before merging:
- [x] Backports should only be created for serious issues or test-only changes.
- [x] Backports should not break backwards-compatibility.
- [x] Backports should change as little code as possible.
- [x] Backports should not change on-disk formats or node communication protocols.
- [x] Backports should not add new functionality (except as defined here).
- [x] Backports must not add, edit, or otherwise modify cluster versions; or add version gates.
- [ ] All backports must be reviewed by the owning areas TL and one additional TL. For more information as to how that review should be conducted, please consult the backport policy.
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
- [ ] There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
- [ ] The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
- [ ] New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters. State changes must be further protected such that nodes running old binaries will not be negatively impacted by the new state (with a mixed version test added).
- [ ] The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.
- [ ] Your backport must be accompanied by a post to the appropriate Slack channel (#db-backports-point-releases or #db-backports-XX-X-release) for awareness and discussion.
Also, please add a brief release justification to the body of your PR to justify this backport.
TYFTR
The extended failure(s) are unrelated, probably pre-emption.