bupstash icon indicating copy to clipboard operation
bupstash copied to clipboard

Parallel put pipeline

Open andrewchambers opened this issue 4 years ago • 3 comments

Currently the 'put' command has a profile mostly divided into 4 parts, 1/4 rollsum, 1/4 hashing, 1/4 compressing and 1/4 encrypting. It seems like if this work is done in a pipeline, with one thread doing each part, we should be able to get a large speedup without significantly complicating the design.

andrewchambers avatar Mar 22 '21 01:03 andrewchambers

This is not necessarily high priority as much of the work is skipped with a stat cache anyway. The stat cache perhaps deserves its own optimization section, since I'm sure it could be made faster too.

andrewchambers avatar Mar 22 '21 01:03 andrewchambers

Some experiments with a prototype branch gets ~ 10 percent speed gains while doing a large local backup.

andrewchambers avatar Apr 04 '21 03:04 andrewchambers

Something important/useful is that we can process dirs entirely in parallel, as long as we build the final index serially. One way to do that is to process directories in parallel workers, then do a pass that takes the cached hashes and forms the actual htree.

andrewchambers avatar Jun 26 '21 10:06 andrewchambers

I Implemented this as it was important for large ceph fs clusters.

andrewchambers avatar Sep 23 '22 10:09 andrewchambers