nydus icon indicating copy to clipboard operation
nydus copied to clipboard

local cas: achieve chunk deduplication for nydus.

Open xwb1136021767 opened this issue 2 years ago • 11 comments

Relevant Issue (if applicable)

If there are Issues related to this PullRequest, please list it.

Details

Original version from https://github.com/dragonflyoss/image-service/pull/956. The previous version of local cas was static dedup, which only modified the chunk information in bootstrap. There is a serious problem: it may reuse chunks that cannot be obtained by the backend of the current image, resulting in the container being unable to load the corresponding chunk data on demand during runtime. To address this issue, dynamic dedup was introduced. When nydusd initializes the blob cache, it reads the corresponding backend configuration information of the blob from the CAS database, enabling the blob cache to read chunk data from other backend.

What's more, the mismatch between dynamic dedup and nydus' chunk amplification can result in a larger cache size after dedup than without dedup. Because chunk amplification can cause reused chunks to be pulled multiple times, resulting in a larger cache size after dedup is enabled than when dedup is not enabled. To address this issue, a dedup_bitmap was introduced. When initializing rafs, dedup_bitmap is generated based on the chunk information in blob. The determination of whether a chunk in a blob is ready requires both the chunk map and deduplication bitmap to make a joint decision.

Types of changes

What types of changes does your PullRequest introduce? Put an x in all the boxes that apply:

  • [ x] Bug fix (non-breaking change which fixes an issue)
  • [ x] New feature (non-breaking change which adds functionality)
  • [ x] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] Documentation Update (if none of the other choices apply)

Checklist

Go over all the following points, and put an x in all the boxes that apply.

  • [ ] I have updated the documentation accordingly.
  • [ ] I have added tests to cover my changes.

xwb1136021767 avatar Aug 09 '23 03:08 xwb1136021767

Codecov Report

Merging #1399 (6c5147f) into master (0916979) will decrease coverage by 0.90%. The diff coverage is 31.95%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1399      +/-   ##
==========================================
- Coverage   62.70%   61.81%   -0.90%     
==========================================
  Files         123      125       +2     
  Lines       43248    44530    +1282     
  Branches    43248    44530    +1282     
==========================================
+ Hits        27120    27526     +406     
- Misses      14817    15660     +843     
- Partials     1311     1344      +33     
Files Coverage Δ
builder/src/core/v6.rs 75.65% <100.00%> (ø)
builder/src/lib.rs 64.79% <ø> (ø)
utils/src/lib.rs 98.87% <ø> (ø)
storage/src/cache/dummycache.rs 94.33% <80.00%> (-0.21%) :arrow_down:
utils/src/compress/mod.rs 97.83% <0.00%> (ø)
rafs/src/metadata/mod.rs 70.99% <0.00%> (ø)
utils/src/crypt.rs 93.04% <0.00%> (ø)
utils/src/digest.rs 92.06% <0.00%> (ø)
builder/src/compact.rs 80.32% <0.00%> (-0.25%) :arrow_down:
rafs/src/metadata/cached_v5.rs 80.76% <0.00%> (-0.30%) :arrow_down:
... and 21 more

... and 1 file with indirect coverage changes

codecov[bot] avatar Aug 09 '23 05:08 codecov[bot]

@xwb1136021767 Sorry for the delayed reply, please help to add some docs about background and usage for users.

imeoer avatar Oct 24 '23 06:10 imeoer

This is a particularly cool feature that I think would benefit everyone who deals with multiple random images that essentially share the same stuff thats used at runtime (e.g. libpython3.so) but built by different methodologies (different base images etc) and would optimize cold starts in a very intriguing way. Any chance on knowing what are the next steps for this PR to get it streamlined aside from the existing reviews (which all seems easily addressable, so am not sure if they are a blocker)? Is the concept itself is OK for nydus maintainers? Are there any help needed which I could contribute to (am not super knowledgeable on nydus side, but would be happy to dive deep into fixing the easy stuff like the existing reviews) to push this forward?

isidentical avatar Nov 06 '23 16:11 isidentical

@xwb1136021767 Can we also give a performance test result in the doc to give users confidence?

imeoer avatar Nov 20 '23 06:11 imeoer

This is a particularly cool feature that I think would benefit everyone who deals with multiple random images that essentially share the same stuff thats used at runtime (e.g. libpython3.so) but built by different methodologies (different base images etc) and would optimize cold starts in a very intriguing way. Any chance on knowing what are the next steps for this PR to get it streamlined aside from the existing reviews (which all seems easily addressable, so am not sure if they are a blocker)? Is the concept itself is OK for nydus maintainers? Are there any help needed which I could contribute to (am not super knowledgeable on nydus side, but would be happy to dive deep into fixing the easy stuff like the existing reviews) to push this forward?

This idea is warmly welcomed:) The only issue is a little bigger for review, will put more effort on this.

jiangliu avatar Nov 27 '23 01:11 jiangliu

@xwb1136021767 When reviewing the PR, I have worked out some enhancements/fixes over your work. How about holding on for while, then I will try to submit my patches to your repo?

jiangliu avatar Nov 28 '23 06:11 jiangliu

@xwb1136021767 When reviewing the PR, I have worked out some enhancements/fixes over your work. How about holding on for while, then I will try to submit my patches to your repo?

Thank you so much for your help!

xwb1136021767 avatar Nov 28 '23 06:11 xwb1136021767

I can't understand why we need to rebuild a new bootstrap, can we do runtime de-duplication by enhancing the blob cache? i.e. for example using a global (chunk_digest, chunk_info) db records.

I have also considered the solution you mentioned before. My previous concern was whether runtime deduplication would make the CAS database a performance bottleneck in IO intensive scenarios. There are indeed problems with the current solution, and personally, I think the biggest problem is how to combine it with the GC mechanism of the image, especially in localfs or localdisk mode, because nydus snapshot cannot prevent the deletion operation of the image.

xwb1136021767 avatar Nov 29 '23 02:11 xwb1136021767

I have also considered the solution you mentioned before. My previous concern was whether runtime deduplication would make the CAS database a performance bottleneck in IO intensive scenarios. There are indeed problems with the current solution, and personally, I think the biggest problem is how to combine it with the GC mechanism of the image, especially in localfs or localdisk mode, because nydus snapshot cannot prevent the deletion operation of the image.

This may require some testing, full dedup and rebuilding the whole bootstrap (not only chunk infos?) on booting may also affect cold start performance. Either full dedup or on-demand dedup, there will be GC issues as containerd checks if an image is referencing a blob based on the image manifest, references:

https://github.com/containerd/nydus-snapshotter/blob/5009c522df583cdf76ee37ae2a3c6440d5d79797/snapshot/snapshot.go#L583

https://github.com/containerd/nydus-snapshotter/blob/5009c522df583cdf76ee37ae2a3c6440d5d79797/pkg/cache/manager.go#L70

imeoer avatar Nov 29 '23 03:11 imeoer

A simplified version to implement chunk deduplication at runtime instead of at build time. #1507

jiangliu avatar Dec 07 '23 07:12 jiangliu