incubator-pegasus icon indicating copy to clipboard operation
incubator-pegasus copied to clipboard

Support resumable download for checkpoint files during full duplication

Open empiredan opened this issue 8 months ago • 0 comments

Motivation

During a full duplication process, the dup follower needs to copy all checkpoint files from the dup master before it can start incremental synchronization. If the download of checkpoint files is interrupted (e.g., due to network failure, machine crash, etc.), the follower must re-copy all checkpoint files from scratch. This can be very costly, especially when checkpoint files are large and network speed is slow (e.g., cross-public-network synchronization).

To address this, the checkpoint file copying process needs to support resumable downloads.

Implementation

When the dup follower requests the latest checkpoint information from the dup master, in addition to the list of filenames under the checkpoint directory, the dup master will also return the size and checksum of each file.
The dup follower can then compare this information with its local files to determine:

  • Which files are already fully downloaded and do not need to be re-downloaded;
  • Which files still need to be fetched from the dup master.

This ensures that only the incomplete or missing files are downloaded, significantly improving the efficiency and robustness of the duplication process.

Task List

  • [x] https://github.com/apache/incubator-pegasus/pull/2238

empiredan avatar Apr 28 '25 08:04 empiredan