bupstash Parallel put pipeline

Parallel put pipeline

Open andrewchambers opened this issue 4 years ago • 3 comments

Currently the 'put' command has a profile mostly divided into 4 parts, 1/4 rollsum, 1/4 hashing, 1/4 compressing and 1/4 encrypting. It seems like if this work is done in a pipeline, with one thread doing each part, we should be able to get a large speedup without significantly complicating the design.

Mar 22 '21 01:03 andrewchambers

This is not necessarily high priority as much of the work is skipped with a stat cache anyway. The stat cache perhaps deserves its own optimization section, since I'm sure it could be made faster too.

Mar 22 '21 01:03 andrewchambers

Some experiments with a prototype branch gets ~ 10 percent speed gains while doing a large local backup.

Apr 04 '21 03:04 andrewchambers

Something important/useful is that we can process dirs entirely in parallel, as long as we build the final index serially. One way to do that is to process directories in parallel workers, then do a pass that takes the cached hashes and forms the actual htree.

Jun 26 '21 10:06 andrewchambers

I Implemented this as it was important for large ceph fs clusters.

Sep 23 '22 10:09 andrewchambers

bupstash bupstash copied to clipboard

Parallel put pipeline

bupstash
bupstash copied to clipboard