borg icon indicating copy to clipboard operation
borg copied to clipboard

Implement chunk comparison and selective extraction for borg extract (#5638)

Open alighazi288 opened this issue 11 months ago • 6 comments

Archive File Chunk Comparison and Extraction

This implementation provides efficient file restoration from archives by comparing and extracting chunks. Instead of blindly extracting entire files, it:

  1. Compares existing file content with archived chunks
  2. Only fetches and updates chunks that differ
  3. Handles various edge cases:
    • Partial chunks at end of files
    • Files longer/shorter than archive version
    • Empty files
    • Cross-chunk boundary changes

alighazi288 avatar Jan 12 '25 05:01 alighazi288

@ThomasWaldmann Since this is a significant change, I wanted to open this PR as a draft to get your feedback before proceeding further.

Could you please review the current implementation and provide any suggestions or improvements?

alighazi288 avatar Jan 12 '25 05:01 alighazi288

Codecov Report

Attention: Patch coverage is 78.26087% with 10 lines in your changes missing coverage. Please review.

Project coverage is 81.80%. Comparing base (1559a1e) to head (57760ef). Report is 17 commits behind head on master.

Files with missing lines Patch % Lines
src/borg/archive.py 78.26% 6 Missing and 4 partials :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8632      +/-   ##
==========================================
- Coverage   81.83%   81.80%   -0.04%     
==========================================
  Files          74       74              
  Lines       13319    13393      +74     
  Branches     1963     1981      +18     
==========================================
+ Hits        10900    10956      +56     
- Misses       1755     1767      +12     
- Partials      664      670       +6     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jan 12 '25 05:01 codecov[bot]

Done.

alighazi288 avatar Jan 13 '25 05:01 alighazi288

@ThomasWaldmann This is getting a bit too advanced for my understanding, but I've still tried to implement and verify the attribute restoration. I'm still stuck on the recreate_cmd test failures - would appreciate some guidance there.

alighazi288 avatar Jan 27 '25 00:01 alighazi288

@ThomasWaldmann any advice?

alighazi288 avatar Feb 03 '25 21:02 alighazi288

BTW, an alternative way to solve the "we need default / cleared metadata" issue would be to NOT reuse the complete local file (metadata + data), but create a new temp file and then either:

  • copy some content from the existing local file
  • copy some content from the repo

After that, apply the metadata to the temp file and finally, replace the existing local file by the temp file.

This is more expensive than an in-place update of an existing file (due to the data copying), but has the advantage that the file state will be atomically updated when the rename happens, so the file at the original place/path will never be in an "intermediate" state, but either be in the old state or in the fully extracted state.

ThomasWaldmann avatar Mar 02 '25 16:03 ThomasWaldmann