nativelink
nativelink copied to clipboard
Potential improvements to fastcdc
The code at https://github.com/TraceMachina/nativelink/blob/main/nativelink-util/src/fastcdc.rs seems unnecessary.
Potential alternatives:
- https://github.com/nlfiedler/fastcdc-rs
- https://github.com/florian-g2/fastcdc-rs-alt
There is also an interesting algorithm called UltraCDC from some of the FastCDC authors that seems worth investigating:
- https://github.com/PlakarLabs/go-cdc-chunkers/blob/main/chunkers/ultracdc/ultracdc.go
We do reference that our implementation is based on one of those listed: https://github.com/TraceMachina/nativelink/blob/b9029bb073a2d56d1a2b713fdb7d6ff4de69ff64/nativelink-util/src/fastcdc.rs#L5
We roll our own so we can use tokio_util::codec::Decoder on it as well as use Bytes. Otherwise we have to do some weird bindings between the two.
Oh I didn't realize. Looks like upstream uses Vec<8>. Maybe upstream would be interested in a Bytes variant, but this might not be a desirable change after all.
UltraCDC could be interesting though.
I will work on this issue.
cc: @allada, @MarcusSorealheis
I hope you can take a moment to review this algorithm approach. Your feedback would be greatly appreciated.
cc: @allada, @MarcusSorealheis