cockroach
cockroach copied to clipboard
release-22.1: backupccl: elide spans from backups that were subsequently reintroduced
Backport 1/1 commits from #87312.
/cc @cockroachdb/release
Currently RESTORE may restore invalid backup data from a backed up table that underwent an IMPORT rollback. See https://github.com/cockroachdb/cockroach/issues/87305 for a detailed explanation.
This patch ensures that RESTORE elides older backup data that were deleted via a non-MVCC operation. Because incremental backups always reintroduce spans (i.e. backs them up from timestamp 0) that may have undergone a non-mvcc operation, restore can identify restoring spans with potentially corrupt data in the backup chain and only ingest the spans' reintroduced data to any system time, without the corrupt data.
Here's the basic impliemenation in Restore:
- For each span we want to restore
- identify the last time, l, the span was introduced, using the manifests
- dont restore the span using a backup if backup.EndTime < l
This implementation rests on the following assumption: the input spans for each restoration flow (created in createImportingDescriptors) and the restoreSpanEntries (created by makeSimpleImportSpans) do not span across multiple tables. Given this assumption, makeSimpleImportSpans skips adding files from a backups for a given input span that was reintroduced in a subsequent backup.
It's worth noting that all significant refactoring occurs on code run by the restore coordinator; therefore, no special care needs to be taken for mixed / cross version backups. In other words, if the coordinator has updated, the cluster restores properly; else, the bug will exist on the restored cluster. It's also worth noting that other forms of this bug are apparent on older cluster versions (https://github.com/cockroachdb/cockroach/issues/88042, https://github.com/cockroachdb/cockroach/issues/88043) and has not been noticed by customers; thus, there is no need to fail a mixed version restore to protect the customer from this already existing bug.
Informs https://github.com/cockroachdb/cockroach/issues/87305
Release justification: bug fix
Release note: fix for TA advisory https://cockroachlabs.atlassian.net/browse/TSE-198
Thanks for opening a backport.
Please check the backport criteria before merging:
- [ ] Patches should only be created for serious issues or test-only changes.
- [ ] Patches should not break backwards-compatibility.
- [ ] Patches should change as little code as possible.
- [ ] Patches should not change on-disk formats or node communication protocols.
- [ ] Patches should not add new functionality.
- [ ] Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
- [ ] There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
- [ ] The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
- [ ] New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
- [ ] The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.
Add a brief release justification to the body of your PR to justify this backport.
Some other things to consider:
- What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
- Will this work in a cluster of mixed patch versions? Did we test that?
- If a user upgrades a patch version, uses this feature, and then downgrades, what happens?
Note that the 22.1 backport required the following additional changes:
- adding the MakeFrontierAt function
- an additional commit in to address https://github.com/cockroachdb/cockroach/issues/88042 . I'll also need to forward port this to 22.2 in order to handle cross version backup chains, but it's easier to test in 22.1, so I added it here first.
- a few test-only changes that I plan to also change in master / 22.2
RFAL! I'd review commit by commit. As discussed over meets:
- y'all have never seen the first commit. I plan to forward port it to 22.2.
- the second commit is the same as what's on 22.2, aside from a a few function signature and test only changes. I also had to add the MakeFrontierAt function.
apologies for the re-pushing. Realized the dd tests ended up in the first commit. They don't pass unless they are in the second commit. Also added a case to RestoreSpaneEntryCoverExample