Support runtime chunk deduplication
Details
This PR enhances nydusd to support runtime chunk deduplication. It works in this way:
- Use a sqlite database to record information about decompressed/plaintext chunks available on local node.
- When a chunk is not ready in the uncompressed data blob file, query the sqlite database whether a chunk with the same chunk digest is available. If a chunk with the same chunk digest exists, copy the decompressed from the source data blob file to the target data blob by using
copy_file_range(). - Otherwise download the compressed chunk from remote, uncompress it and write to the target data blob, and add a record for the chunk to the database.
So there are two types of chunk deduplication:
- saving network bandwidth when the chunk is available on local node, because we don't need to download compressed chunk data from remote.
- saving local disk space if the underlying filesystem supports reference. If the filesystem storing data blob files supports reference,
copy_file_range()will optimize to use reference instead of data copy, thus reduce local storage consuption.
Types of changes
What types of changes does your PullRequest introduce? Put an x in all the boxes that apply:
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Documentation Update (if none of the other choices apply)
Checklist
Go over all the following points, and put an x in all the boxes that apply.
- [x] I have updated the documentation accordingly.
- [x] I have added tests to cover my changes.
Codecov Report
Merging #1507 (7d287c9) into master (06755fe) will increase coverage by
0.02%. The diff coverage is66.51%.
Additional details and impacted files
@@ Coverage Diff @@
## master #1507 +/- ##
==========================================
+ Coverage 62.72% 62.74% +0.02%
==========================================
Files 129 129
Lines 44153 44360 +207
Branches 44153 44360 +207
==========================================
+ Hits 27695 27834 +139
- Misses 15087 15144 +57
- Partials 1371 1382 +11
| Files | Coverage Δ | |
|---|---|---|
| storage/src/cache/dedup/db.rs | 79.09% <100.00%> (+0.08%) |
:arrow_up: |
| storage/src/cache/mod.rs | 57.84% <ø> (ø) |
|
| utils/src/digest.rs | 91.53% <0.00%> (-0.53%) |
:arrow_down: |
| storage/src/cache/filecache/mod.rs | 67.58% <66.66%> (+0.08%) |
:arrow_up: |
| storage/src/cache/fscache/mod.rs | 75.92% <63.63%> (-0.47%) |
:arrow_down: |
| storage/src/utils.rs | 93.59% <78.94%> (-2.30%) |
:arrow_down: |
| storage/src/cache/cachedfile.rs | 33.14% <0.00%> (-0.44%) |
:arrow_down: |
| src/bin/nydusd/main.rs | 0.18% <0.00%> (-0.01%) |
:arrow_down: |
| storage/src/cache/dedup/mod.rs | 72.72% <82.75%> (+72.72%) |
:arrow_up: |
Hi all, I tried out this feature and it seems to work as expected. Is there something preventing it from being merged?
Hi all, I tried out this feature and it seems to work as expected. Is there something preventing it from being merged?
cc @jiangliu any updates we can continue? :)