electric icon indicating copy to clipboard operation
electric copied to clipboard

Potential missing data on restore due to fsync not guaranteeing writes to disk

Open robacourt opened this issue 5 months ago • 1 comments

As mentioned by @alco in #2925 fsync (or :file.datasync/1 in our code) does not guarantee that the data has been written to disk, only that the OS has been asked to. If power is lost the data may not be recoverable. #2974 acks on flush/fsync which is a big improvement on what we had before, but still allows a chance that data will be lost if we ack the flushed data and power is lost before the data is written to disk, the system restarts and resumes from a point after the missing data.

As @icehaunter has mentioned, we could look to see how other databases such as Postgres have closed this loophole.

robacourt avatar Aug 12 '25 13:08 robacourt

I think a potential strategy is to persist the last acked LSN after all shape log for that LSN have been flushed and use that as the reference of what's guaranteed to be persisted.

On recovery, if it was a dirty shutdown (#2984), we drop any shapes who's logs are ahead of the persisted flushed LSN.

balegas avatar Aug 18 '25 10:08 balegas