nydus
nydus copied to clipboard
local cas: achieve chunk deduplication for nydus.
Relevant Issue (if applicable)
If there are Issues related to this PullRequest, please list it.
Details
Original version from https://github.com/dragonflyoss/image-service/pull/956. The previous version of local cas was static dedup, which only modified the chunk information in bootstrap. There is a serious problem: it may reuse chunks that cannot be obtained by the backend of the current image, resulting in the container being unable to load the corresponding chunk data on demand during runtime. To address this issue, dynamic dedup was introduced. When nydusd initializes the blob cache, it reads the corresponding backend configuration information of the blob from the CAS database, enabling the blob cache to read chunk data from other backend.
What's more, the mismatch between dynamic dedup and nydus' chunk amplification can result in a larger cache size after dedup than without dedup. Because chunk amplification can cause reused chunks to be pulled multiple times, resulting in a larger cache size after dedup is enabled than when dedup is not enabled. To address this issue, a dedup_bitmap was introduced. When initializing rafs, dedup_bitmap is generated based on the chunk information in blob. The determination of whether a chunk in a blob is ready requires both the chunk map and deduplication bitmap to make a joint decision.
Types of changes
What types of changes does your PullRequest introduce? Put an x in all the boxes that apply:
- [ x] Bug fix (non-breaking change which fixes an issue)
- [ x] New feature (non-breaking change which adds functionality)
- [ x] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Documentation Update (if none of the other choices apply)
Checklist
Go over all the following points, and put an x in all the boxes that apply.
- [ ] I have updated the documentation accordingly.
- [ ] I have added tests to cover my changes.
Codecov Report
Merging #1399 (6c5147f) into master (0916979) will decrease coverage by
0.90%. The diff coverage is31.95%.
Additional details and impacted files
@@ Coverage Diff @@
## master #1399 +/- ##
==========================================
- Coverage 62.70% 61.81% -0.90%
==========================================
Files 123 125 +2
Lines 43248 44530 +1282
Branches 43248 44530 +1282
==========================================
+ Hits 27120 27526 +406
- Misses 14817 15660 +843
- Partials 1311 1344 +33
| Files | Coverage Δ | |
|---|---|---|
| builder/src/core/v6.rs | 75.65% <100.00%> (ø) |
|
| builder/src/lib.rs | 64.79% <ø> (ø) |
|
| utils/src/lib.rs | 98.87% <ø> (ø) |
|
| storage/src/cache/dummycache.rs | 94.33% <80.00%> (-0.21%) |
:arrow_down: |
| utils/src/compress/mod.rs | 97.83% <0.00%> (ø) |
|
| rafs/src/metadata/mod.rs | 70.99% <0.00%> (ø) |
|
| utils/src/crypt.rs | 93.04% <0.00%> (ø) |
|
| utils/src/digest.rs | 92.06% <0.00%> (ø) |
|
| builder/src/compact.rs | 80.32% <0.00%> (-0.25%) |
:arrow_down: |
| rafs/src/metadata/cached_v5.rs | 80.76% <0.00%> (-0.30%) |
:arrow_down: |
| ... and 21 more |
@xwb1136021767 Sorry for the delayed reply, please help to add some docs about background and usage for users.
This is a particularly cool feature that I think would benefit everyone who deals with multiple random images that essentially share the same stuff thats used at runtime (e.g. libpython3.so) but built by different methodologies (different base images etc) and would optimize cold starts in a very intriguing way. Any chance on knowing what are the next steps for this PR to get it streamlined aside from the existing reviews (which all seems easily addressable, so am not sure if they are a blocker)? Is the concept itself is OK for nydus maintainers? Are there any help needed which I could contribute to (am not super knowledgeable on nydus side, but would be happy to dive deep into fixing the easy stuff like the existing reviews) to push this forward?
@xwb1136021767 Can we also give a performance test result in the doc to give users confidence?
This is a particularly cool feature that I think would benefit everyone who deals with multiple random images that essentially share the same stuff thats used at runtime (e.g.
libpython3.so) but built by different methodologies (different base images etc) and would optimize cold starts in a very intriguing way. Any chance on knowing what are the next steps for this PR to get it streamlined aside from the existing reviews (which all seems easily addressable, so am not sure if they are a blocker)? Is the concept itself is OK for nydus maintainers? Are there any help needed which I could contribute to (am not super knowledgeable on nydus side, but would be happy to dive deep into fixing the easy stuff like the existing reviews) to push this forward?
This idea is warmly welcomed:) The only issue is a little bigger for review, will put more effort on this.
@xwb1136021767 When reviewing the PR, I have worked out some enhancements/fixes over your work. How about holding on for while, then I will try to submit my patches to your repo?
@xwb1136021767 When reviewing the PR, I have worked out some enhancements/fixes over your work. How about holding on for while, then I will try to submit my patches to your repo?
Thank you so much for your help!
I can't understand why we need to rebuild a new bootstrap, can we do runtime de-duplication by enhancing the blob cache? i.e. for example using a global (chunk_digest, chunk_info) db records.
I have also considered the solution you mentioned before. My previous concern was whether runtime deduplication would make the CAS database a performance bottleneck in IO intensive scenarios. There are indeed problems with the current solution, and personally, I think the biggest problem is how to combine it with the GC mechanism of the image, especially in localfs or localdisk mode, because nydus snapshot cannot prevent the deletion operation of the image.
I have also considered the solution you mentioned before. My previous concern was whether runtime deduplication would make the CAS database a performance bottleneck in IO intensive scenarios. There are indeed problems with the current solution, and personally, I think the biggest problem is how to combine it with the GC mechanism of the image, especially in localfs or localdisk mode, because nydus snapshot cannot prevent the deletion operation of the image.
This may require some testing, full dedup and rebuilding the whole bootstrap (not only chunk infos?) on booting may also affect cold start performance. Either full dedup or on-demand dedup, there will be GC issues as containerd checks if an image is referencing a blob based on the image manifest, references:
https://github.com/containerd/nydus-snapshotter/blob/5009c522df583cdf76ee37ae2a3c6440d5d79797/snapshot/snapshot.go#L583
https://github.com/containerd/nydus-snapshotter/blob/5009c522df583cdf76ee37ae2a3c6440d5d79797/pkg/cache/manager.go#L70
A simplified version to implement chunk deduplication at runtime instead of at build time. #1507