neon icon indicating copy to clipboard operation
neon copied to clipboard

compute_tools: disable fast path for safekeeper sync

Open erikgrinaker opened this issue 1 year ago • 1 comments
trafficstars

Problem

In #9259, we found that the check_safekeepers_synced fast path could result in a lower basebackup LSN than the flush_lsn reported by Safekeepers in VoteResponse, causing the compute to panic once on startup.

This would happen if the Safekeeper had unflushed WAL records due to a compute disconnect. The TIMELINE_STATUS query would report a flush_lsn below these unflushed records, while VoteResponse would flush the WAL and report the advanced flush_lsn. See https://github.com/neondatabase/neon/issues/9259#issuecomment-2410849032.

Summary of changes

Disable the check_safekeepers_synced fast path. Discussion in https://github.com/neondatabase/neon/issues/9259#issuecomment-2417314252 indicates that it has questionable value and may need further improvements.

There is also a separate fix to flush the Safekeeper WAL on disconnect: https://github.com/neondatabase/neon/pull/9436.

Checklist before requesting a review

  • [ ] I have performed a self-review of my code.
  • [ ] If it is a core feature, I have added thorough tests.
  • [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • [ ] Do not forget to reformat commit message to not include the above checklist

erikgrinaker avatar Oct 17 '24 08:10 erikgrinaker