Implement chunk comparison and selective extraction for borg extract (#5638)
Archive File Chunk Comparison and Extraction
This implementation provides efficient file restoration from archives by comparing and extracting chunks. Instead of blindly extracting entire files, it:
- Compares existing file content with archived chunks
- Only fetches and updates chunks that differ
- Handles various edge cases:
- Partial chunks at end of files
- Files longer/shorter than archive version
- Empty files
- Cross-chunk boundary changes
@ThomasWaldmann Since this is a significant change, I wanted to open this PR as a draft to get your feedback before proceeding further.
Could you please review the current implementation and provide any suggestions or improvements?
Codecov Report
Attention: Patch coverage is 78.26087% with 10 lines in your changes missing coverage. Please review.
Project coverage is 81.80%. Comparing base (
1559a1e) to head (57760ef). Report is 17 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/borg/archive.py | 78.26% | 6 Missing and 4 partials :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #8632 +/- ##
==========================================
- Coverage 81.83% 81.80% -0.04%
==========================================
Files 74 74
Lines 13319 13393 +74
Branches 1963 1981 +18
==========================================
+ Hits 10900 10956 +56
- Misses 1755 1767 +12
- Partials 664 670 +6
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Done.
@ThomasWaldmann This is getting a bit too advanced for my understanding, but I've still tried to implement and verify the attribute restoration. I'm still stuck on the recreate_cmd test failures - would appreciate some guidance there.
@ThomasWaldmann any advice?
BTW, an alternative way to solve the "we need default / cleared metadata" issue would be to NOT reuse the complete local file (metadata + data), but create a new temp file and then either:
- copy some content from the existing local file
- copy some content from the repo
After that, apply the metadata to the temp file and finally, replace the existing local file by the temp file.
This is more expensive than an in-place update of an existing file (due to the data copying), but has the advantage that the file state will be atomically updated when the rename happens, so the file at the original place/path will never be in an "intermediate" state, but either be in the old state or in the fully extracted state.