wyng-backup icon indicating copy to clipboard operation
wyng-backup copied to clipboard

Optimization - CPU

Open tasket opened this issue 6 years ago • 10 comments

Changes that may improve throughput, especially for send:

  • Multithreaded encoding layer for compression and encryption
  • Other areas for concurrency: getting deltas, dedup init, send and receive main loops
  • Alternatives to tar streaming, such as direct file IO for internal: destination
  • Static buffers to avoid garbage collection
  • Structs, especially in deduplication code
  • Explore new Python optimization options
  • Tighten the main send loop, use locals
  • Use formats instead of + string concat
  • Quicker compression – issue #23

tasket avatar Dec 14 '18 16:12 tasket

An optimization attempt was posted to the optimize1 branch. Unfortunately, the limited testing I did showed little if any difference.

I may try re-introducing some of these changes on top of alpha4 and do some more extensive trials.

tasket avatar Dec 19 '18 23:12 tasket

(Note) Some optimization of prune/merge was recently done by setting LC_ALL=C and using -m merge where possible. In some cases this slashes pruning time by more than 75%.

tasket avatar Feb 22 '19 23:02 tasket

https://github.com/tasket/wyng-backup/issues/179#issuecomment-1961553338

An interesting observation.

The first 3 blurps of traffic are:

with 128K blocks
with 2048K blocks
with 2048K blocks and compression set to zstd:1

Those tests were run a against a 500GB test volume. The 4th block is the 22TB volume I would expect the B/s to be in the same general ball-park but there is something going on.

image

tasket avatar Feb 23 '24 22:02 tasket

@alvinstarr I think what may be going on is the large difference in Cpython's garbage collection workload due to dynamic buffering having to juggle larger buffers. (Hmmm. Does a 1MB buffer behave much differently?)

Wyng does not yet use static buffering for transfer operations. And I always suspected that locally-based archives would someday throw performance issues that were masked by net access into high relief (as your benchmark just did).

It would also be interesting to see the difference, for instance, with the helper script removed from the local transfer loop. That in combination with using static buffers could make a big difference, IMO. However, the limitations of the zstandard lib I'm currently using precludes static buffering.

One really cheap (and safe) tweak you could try in the Wyng code is to remove the file IO buffering= parameter, letting Cpython adjust automatically:

    # Open source volume and its delta bitmap as r, session manifest as w.
    with open(volpath,"rb", buffering=chunksize) as vf,    \

(Moved to issue 11.)

tasket avatar Feb 23 '24 22:02 tasket

@alvinstarr I've tested a simple modification to Wyng that bypasses the tarfile streaming when the destination is a local filesystem. This improves the throughput by 17%.

The parameters I'm using are: Same SSD source and destination, 2MB chunks, zstd:0 and no encryption.

tasket avatar Feb 26 '24 03:02 tasket

BTW, removing buffering=chunksize parameter did not improve throughput.

tasket avatar Feb 26 '24 03:02 tasket

got side tracked a bit. I will take a look at your changes and see if it helps our situation.

We have a backup and an incremental of our 27TB volume. Its good to here about the speed speed improvement since it took 4 days to run the 27TB backup. The first incremental took close to 2 days.

We are looking at copying the our backups to a remote location so that we can have off-site storage. As part of that process we started copying using rsync and scp but we ran into bdp problems(bandwidth delay product). to get around this we are using bbcp([https://www.slac.stanford.edu/~abh/bbcp/]) The speed improvement has been from 20Mbs to 500Mbps by using bbcp. Not sure if you can leverage it but bbcp can provide a huge speed improvement for large data sets.

alvinstarr avatar Feb 28 '24 04:02 alvinstarr

@alvinstarr Sorry, I got sidetracked as well. I just posted the --tar-bypass optimization for send after fixing some bugs. Use it with send when the dest archive is local (file:/...). It will indicate it is bypassing the tar stream.

The kicker is that while I'm seeing throughput increase >17% on an archive with 2MB chunks, I also tried an archive with 1MB chunks. It appears that the 1MB chunk size is the fastest for sending an initial/whole volume, regardless of whether tar-bypass is used, and the gain in throughput going from 2MB to 1MB is about 25%.

Overall, sending to a 1MB archive while using the --tar-bypass I saw as much as 37% gain in throughput.

The tar-bypass is considered experimental at this point, although I don't anticipate it causing any issues.

Thanks for the tip about bbcp for backup copies. To be useful inside of Wyng, a copy/archive utility would have to handle streams from memory as well as files (this is why Python's tarfile lib was used). To get a similar muti-thread, multi-stream boost in wyng send will probably require using asyncio or one of the new multiprocess options.

tasket avatar Feb 29 '24 19:02 tasket

@alvinstarr PS - You may want to look into bbcp's behavior when updating existing file sets. The documentation has scant info on that subject and I could not figure out if it would skip files based on file timestamps, for example. It also doesn't seem to have a delete feature. So my own preference would be, after the initial bbcp copy, to use rsync -aH --delete to update the offsite backups. Of course, if you have doubts about the effect of a copy or update, you can always use Wyng's arch-check feature to check the integrity of the copy.

tasket avatar Feb 29 '24 23:02 tasket

bbcp is defiantly not a replacement for rsync because of delete and sync of existing files like you mentioned. Also for lots of small files like we are running here the best way to use bbcp is in pipe mode with tar on both ends.

bbcp may not integrate well into what your doing but it may be possible to leverage the knowledge and work that has gone into doing the network socket processing.

alvinstarr avatar Mar 01 '24 02:03 alvinstarr