B2_Command_Line_Tool icon indicating copy to clipboard operation
B2_Command_Line_Tool copied to clipboard

Throttling for (for sync and other)

Open SuperTango opened this issue 7 years ago • 14 comments

Can we get throttling control in the b2 command line tool (especially for sync)?

Thanks.

SuperTango avatar Dec 26 '16 18:12 SuperTango

Hi @SuperTango, thanks for your interest in B2 CLI. You can already impact speed to some extent by using --threads N parameter. If it is not sufficient for you, could you please describe your use case, so that we can better understand it?

ppolewicz avatar Dec 26 '16 18:12 ppolewicz

Thanks @ppolewicz. My use case is pretty simple, I only want to use a percentage of the available bandwidth. For example, my outbound pipe from the datacenter where my Linux machine is has a max throughput of about 200kBps, however I for backups, I want to ensure we only use a max of 75kBps.

SuperTango avatar Dec 26 '16 18:12 SuperTango

It is possible to implement such a limiter, but doing it well in our environment is not easy, as we support many threads. There is no good open-source implementation of a module which would do the heavy lifting, that I could find, and I have spent some time searching for it.

Have you tried using trickle?

ppolewicz avatar Dec 26 '16 19:12 ppolewicz

This is also a feature I was looking for a while ago. I've tried trickle back then and wasn't able to limit the bandwidth. I don't know what the problem was, so maybe there is a workaround. I've just moved the backup job to the middle of the night, so it's no priority for me.

svonohr avatar Dec 26 '16 20:12 svonohr

I think you can also use iptables to limit bandwidth per destination. This will not allow you to set different limits if you run two sync processes concurrently.

I have researched this further and I got interested in writing something like it, just because I found lots of questions about this and no answers other than "use urlgrabber" (which is a libcurl wrapper). But first I need to deal with another challenge in b2 cli, so I'll leave it unassigned.

I don't think it is worth to implement this just for b2 CLI, but it can be made abstract enough to become useful.

If someone is going to work on this, please post here so that we can coordinate.

ppolewicz avatar Dec 26 '16 20:12 ppolewicz

I think this is a pretty core feature for any backup (especially a sync) solution. Not flooding the network when performing a backup of potentially Terabytes of data is a requirement for me, not a "nice to have".

I haven't looked at the B2 command line tool codebase, but I've implemented a simple, yet effective throttling solution for another product I worked on a long time ago. It wasn't particularly difficult, but we were writing to sockets directly (not using a 3rd party lib). With many threads having each thread use 1/N (N = number of threads) amount of the bandwidth is good enough for this use case.

SuperTango avatar Dec 27 '16 20:12 SuperTango

Sync should be smart - if there is an upload limit and a download limit, it should maximize the usage of both resources to minimize the session time, right? If only the limits are added, then likely first the bottleneck will be on uploading and then the bottleneck will be on downloading.

Another issue is that the number of parallel uploads/downloads will change over time as new tasks are scheduled and executed. A simple 1/N would be quite inefficient when compared to a smart one.

If you would be willing to contribute some code to b2 CLI, it would be very welcome! We encourage outside contributors to perform changes on our codebase. Many such changes have been merged already. In order to make it easier to contribute, core developers of this project:

  • provide guidance (through the issue reporting system)
  • provide tool assisted code review (through the Pull Request system)
  • maintain a set of integration tests (run with a production cloud)
  • maintain a set of (well over a hundred) unit tests
  • automatically run unit tests on 14 versions of python (including osx, Jython and pypy)
  • format the code automatically using yapf
  • use static code analysis to find subtle/potential issues with maintainability
  • maintain other Continous Integration tools (coverage tracker)

ppolewicz avatar Dec 27 '16 20:12 ppolewicz

Trickle works for me, an example is trickle -s -u 200 b2 sync --threads 1 /src b2://dst

rwky avatar Jun 03 '18 20:06 rwky

Does b2 download_file_by_id use threads as well? I'm using it to get specific versions of files and it saturates my bandwidth and sometimes causes issues. I will try @rwky's trickle example. Are there any plans to implement --threads N on b2 download_file_by_id? Thanks!

devhen avatar Jan 18 '20 01:01 devhen

in the current version it uses threads to parallelize downloads (it's required by b2 integration checklist), however the number of threads is not changeable from the CLI yet.

The uploading/downloading machinery in b2sdk is being reworked as we speak and one of the many improvements will be the ability to change the number of upload and download threads, or maybe even provide native bandwidth limiters, as a bit more global settings, so that you can tweak it for download, upload, sync, copy and metadata operations (sync internally listing the contents of the bucket also consumes bandwidth). Bandwidth limiting is not planned in the initial scope of the rework, but the new structure of the code goes a long way towards enabling it.

ppolewicz avatar Jan 22 '20 14:01 ppolewicz

We really need to have this in b2 CLI directly.

I see some people are suggesting trickle here, however, note that trickle does NOT work with Python 3.x, only Python 2.x. You can not use trickle to limit bandwidth utilization of python3 scripts, it will transparently fail.

Edit: for those looking for some kind of solution, if you can throttle NIC of the host doing uploads, you can do so for the duration of the upload, however, this is only valid when you have nothing else running on the host. And, this is not really a solution to this issue per-se.

Addvilz avatar Jun 12 '21 13:06 Addvilz

Trickle works for me, an example is trickle -s -u 200 b2 sync --threads 1 /src b2://dst

That doesn't appear to have any impact for me. Is it possibly related to trickle has no effect on Python 3 scripts? I am using Ubuntu 20.04 with B2 version 3.2.1 and trickle version 1.07.

My command (in case I am doing it wrong):

trickle -v -s -u 1 -t 1 b2 sync \
  --delete \
  --threads 1 \
  $FOLDER_TO_BACKUP \
  b2://${B2_BUCKET_NAME}/test

programster avatar Mar 15 '22 15:03 programster

It looks like Ubuntu version of trickle from apt doesn't work very well. That bug report says you should just compile it from source, then it will work. Maybe do that, as opposed to implementing rate limiting in every single program you will ever use in a constrained environment.

Actually the best way to solve it permanently would be to bug Ubuntu to fix their trickle to work with python3.

ppolewicz avatar Mar 15 '22 17:03 ppolewicz

I've had a bit of luck with the Ubuntu packaged version of Trickle by setting the number of b2 threads to 1.

blewa avatar Oct 06 '22 04:10 blewa