rocksdb icon indicating copy to clipboard operation
rocksdb copied to clipboard

Allow kPointInTimeRecovery to recover until corrupted write batch

Open cbi42 opened this issue 1 year ago • 3 comments

Summary: instead of failing DB open, when we see a corrupted write batch, we should follow the convention to report corruption and decide on what to do based on wal recovery mode.

Test plan: added a new unit test.

cbi42 avatar Jul 05 '24 16:07 cbi42

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Jul 05 '24 19:07 facebook-github-bot

@jowlyzhang I think the call to HandleWriteBatchTimestampSizeDifference() above probably needs similar handling since it could iterate through a write batch.

Thanks for the fix and thanks for this note. A quick question, my understanding is kPointInTimeRecovery tries to recover a prefix of the wal records. With this type of handling, if the mal formatted WriteBatch is at the tail of the wal records, I think it's helpful to proceed like this to allow open and still be at point in time. But if the mal formatted WriteBatch is in the middle of the wal records, if we continue like this to recover the latter good WriteBatches, would it contradict what point in time means?

jowlyzhang avatar Jul 10 '24 23:07 jowlyzhang

@jowlyzhang I think the call to HandleWriteBatchTimestampSizeDifference() above probably needs similar handling since it could iterate through a write batch.

Thanks for the fix and thanks for this note. A quick question, my understanding is kPointInTimeRecovery tries to recover a prefix of the wal records. With this type of handling, if the mal formatted WriteBatch is at the tail of the wal records, I think it's helpful to proceed like this to allow open and still be at point in time. But if the mal formatted WriteBatch is in the middle of the wal records, if we continue like this to recover the latter good WriteBatches, would it contradict what point in time means?

reporter.Corruption() will set the status and the while loop checks status.ok() and should exit after a corruption is set. BTW we may want to keep the current behavior (to fail DB open) since these bad write batches are strong indicator for CPU/memory corruption.

cbi42 avatar Jul 11 '24 04:07 cbi42