nativelink icon indicating copy to clipboard operation
nativelink copied to clipboard

Potential improvements to fastcdc

Open aaronmondal opened this issue 1 year ago • 4 comments

The code at https://github.com/TraceMachina/nativelink/blob/main/nativelink-util/src/fastcdc.rs seems unnecessary.

Potential alternatives:

  • https://github.com/nlfiedler/fastcdc-rs
  • https://github.com/florian-g2/fastcdc-rs-alt

There is also an interesting algorithm called UltraCDC from some of the FastCDC authors that seems worth investigating:

  • https://github.com/PlakarLabs/go-cdc-chunkers/blob/main/chunkers/ultracdc/ultracdc.go

aaronmondal avatar Mar 02 '24 21:03 aaronmondal

We do reference that our implementation is based on one of those listed: https://github.com/TraceMachina/nativelink/blob/b9029bb073a2d56d1a2b713fdb7d6ff4de69ff64/nativelink-util/src/fastcdc.rs#L5

We roll our own so we can use tokio_util::codec::Decoder on it as well as use Bytes. Otherwise we have to do some weird bindings between the two.

allada avatar Mar 02 '24 22:03 allada

Oh I didn't realize. Looks like upstream uses Vec<8>. Maybe upstream would be interested in a Bytes variant, but this might not be a desirable change after all.

UltraCDC could be interesting though.

aaronmondal avatar Mar 02 '24 22:03 aaronmondal

I will work on this issue.

cc: @allada, @MarcusSorealheis

leodziki avatar Aug 16 '24 18:08 leodziki

I hope you can take a moment to review this algorithm approach. Your feedback would be greatly appreciated.

cc: @allada, @MarcusSorealheis

leodziki avatar Aug 30 '24 04:08 leodziki