borg icon indicating copy to clipboard operation
borg copied to clipboard

new chunking algorithms

Open ThomasWaldmann opened this issue 8 years ago • 5 comments

fastCDC https://www.usenix.org/system/files/conference/atc16/atc16-paper-xia.pdf

fastCDC are some optimizations on top of the very simple GEAR rolling hash.

old ticket about performance: https://github.com/borgbackup/borg/issues/1021 benchmark from there: https://github.com/borgbackup/borg/files/276308/chunker_bench.py.txt

ThomasWaldmann avatar Sep 12 '17 20:09 ThomasWaldmann

Current master branch buzhash chunker benchmark results (i5-4200u cpu, tw's laptop):

$ python chunker_bench.py 
    length   count      dt throughput chunks_count
1000000000       1 2.586 386.728    120
 100000000      10 2.633 379.811     12
  10000000     100 2.509 398.494      2
   1000000    1000 1.261 793.168      1
    100000   10000 0.109 9203.020      1
     10000  100000 0.230 4352.816      1
      1000 1000000 1.835 545.060      1
       100 1000000 1.798 55.614      1
        10 1000000 1.793 5.576      1
         1 1000000 1.803 0.555      1

ThomasWaldmann avatar Sep 12 '17 21:09 ThomasWaldmann

Current master + quick and dirty 32bit GEAR chunker benchmark results (tw's laptop):

    length   count      dt throughput chunks_count
1000000000       1 1.450 689.640    120
 100000000      10 1.499 667.227     12
  10000000     100 1.412 708.214      2
   1000000    1000 0.693 1442.994      1
    100000   10000 0.108 9257.274      1
     10000  100000 0.225 4441.152      1
      1000 1000000 1.785 560.084      1
       100 1000000 1.744 57.351      1
        10 1000000 1.751 5.711      1
         1 1000000 1.750 0.571      1

ThomasWaldmann avatar Sep 13 '17 00:09 ThomasWaldmann

There is also rabin:

https://restic.github.io/blog/2015-09-12/restic-foundation1-cdc

Would be interesting to compare those three.

rumpelsepp avatar Sep 14 '17 05:09 rumpelsepp

While reading some news about plakar, it talks about UltraCDC. Unfortunately, it's behind the closed doors of IEEE (https://ieeexplore.ieee.org/document/9894295), but maybe the author of plakar can be contacted for that. He made an implementation in Go available at https://github.com/PlakarLabs/go-cdc-chunkers/tree/main/chunkers/ultracdc

Glandos avatar Oct 08 '23 20:10 Glandos