nativelink
nativelink copied to clipboard
Use ultracdc instead of fastcdc
UltraCDC outperforms FastCDC in speed and deduplication
Description
UltraCDC offers superior deduplication efficiency by optimizing chunk boundaries, which reduces storage needs. It provides faster performance compared to FastCDC, thanks to improvements in processing speed. Additionally, its adaptability to various data types and efficient chunk size selection enhance its scalability and overall effectiveness.
Fixes https://github.com/TraceMachina/nativelink/issues/692
Type of change
- [x] New feature (non-breaking change which adds functionality)
- [x] This change requires a documentation update
How Has This Been Tested?
cargo test --all passes as expected; bazel test //... succeeds as well.
Checklist
- [x] Updated documentation if needed
- [x] Tests added/amended
- [x]
bazel test //...passes locally - [ ] PR is contained in a single commit, using
git amendsee some docs
-
Why the Rename? The rename from
fastcdctoultracdcreflect enhancements in speed by implementingultracdccore algorithm. -
Why change the rust library this is based on to a go library? There is no rust library related to
ultracdc. There is an interesting algorithm calledUltraCDCfrom some of theFastCDCauthors that seems worth investigating: https://github.com/PlakarLabs/go-cdc-chunkers/blob/main/chunkers/ultracdc/ultracdc.go After careful analysis, this approach has been translated into a Rust-based library. -
What performance is this improving? Efficiency and Speed: The transition could lead to performance improvements in terms of processing speed, memory usage, or scalability Optimization:
ultracdccould include optimizations that weren't available or possible infastcdc, such as better chunking algorithms, more efficient data handling, or improvements in how data is processed and transferred.
Hope to review the implementation of UltraCDC algorithm approach soon. cc: @allada, @MarcusSorealheis
Removed fastcdc.rs to better compare differences with the newly implemented ultracdc algorithm in this PR. The main difference in this UltraCDC implementation is the use of a rolling hash function to determine chunk boundaries. The rolling hash value is compared against a dynamically selected mask to decide when to split the chunk. This approach allows for more flexible and potentially more accurate chunking compared to the simpler hash-based method used in FastCDC. cc: @allada, @MarcusSorealheis
nativelink-util/src/ultracdc.rs line 84 at r2 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
nit: This should take
&[u8]instead ofBytesMut.
Done
nativelink-util/src/ultracdc.rs line 88 at r2 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
it looks like this is vulnerable to overflows. Can we instead use one of the safe functions like
.saturating_add,.overflowing_addor similar.
Done.
nativelink-util/src/ultracdc.rs line 115 at r2 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
this appears to be unused now.
Done.
nativelink-util/src/ultracdc.rs line 74 at r2 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
nit: Make these
u64and do the casting here so we don't pay for casting below.
Done
nativelink-util/src/ultracdc.rs line 129 at r2 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
nit: use a slice instead, so we don't need to copy the data.
Done
nativelink-util/src/ultracdc.rs line 113 at r2 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
nit: We don't want to copy out data, instead use the
.split_*functions to create the dynamic slices and return them.
Done
Previously, allada (Nathan (Blaise) Bruer) wrote…
I'm concerned because this is appears to be significantly slower now (but have not measured it). We are now computing a rolling hash on every byte making the algorithm now
O(n * m)wherenis number of bytes andmiswindow_size. This function is already quite slow. I'm not sure it is a good idea to make it significantly slower.If we remove the copies and make it so the rolling hash can be updated instead of recomputed, it can probably be accepted.
I understand your concern about the potential performance impact of the current implementation. You're right that computing the rolling hash for every byte could lead to a time complexity of O(n m), where n is the number of bytes and m is the window size.
To optimize this, we can maintain the rolling hash in a way that only updates it when necessary, rather than recalculating it for every byte. This can be achieved by adjusting the hash incrementally as bytes are added and removed from the window, ensuring that we only perform constant-time updates.
I hope to review the revision based on the feedback. cc: @allada, @MarcusSorealheis
Reviewable status: 0 of 2 LGTMs obtained, and 0 of 9 files reviewed, and 4 discussions need to be resolved
a discussion (no related file): I'm still really failing to see why to change this. From what I can tell this is adding a significant amount of overhead to the existing algorithm to only gain the ability for a dynamically selected mask. The existing algorithm is a rolling checksum. Can you perform some measurements to show it performing better on any benchmark along with speed?
Thank you for sharing your thoughts. I understand your concerns about the potential overhead introduced by the changes, especially in the context of the existing rolling checksum algorithm. I will conduct some benchmarks to compare the performance of the current implementation against the proposed changes, focusing on both speed and efficiency. This data will help us make a more informed decision moving forward.
I will conduct some benchmarks to compare the performance of the current implementation against the proposed changes, focusing on both speed and efficiency. This data will help us make a more informed decision moving forward.
Were you ever able to get the benchmark data for this?