borg Multiple Connections/Streams

Are there any plans for supporting multiple concurrent connections for the data transfer? Or is this already possible somehow?

Doing backups across > 1G networks is quite slow, due to the bottleneck of a single connection. For cross-continent backups, it can be even worse, where a single connection won't be able to utilize a 1G and sit at 200M max.

Jun 26 '22 11:06 Slind14

You can run multiple borg processes in parallel, backing up to one repo per process.

Jun 26 '22 12:06 ThomasWaldmann

Hi Thomas,

we use it to backup data from a data warehouse. We can't split the data across multiple repos without losing consistency I'm afraid. Is there another option?

Jun 26 '22 12:06 Slind14

no. not being able to saturate your connection with 1 borg likely comes from internal processing being single-threaded and not internally queued.

but not sure how you ensure consistency. if you used a snapshot to get consistency, you could also run multiple borg to save the snapshot.

Jun 26 '22 13:06 ThomasWaldmann

Is this the first backup you are doing or is there already data in the repo from previous backups?

Jun 26 '22 13:06 ThomasWaldmann

it is not the first backup we just got the point where they can't complete within a day anymore.

When we use iperf3 to measure the bandwidth then we can see that a single connection only gets 100-200M while multiple get > 900M.

For data centers that are not on the other side of the world, we get a higher bandwidth for a single connection. So I doubt it is borg directly. Btw. borg CPU usage is always sitting at 10-20% of one core while uploading. Only when saving the file cache does it go to 100% and bandwidth to 0. The files are also quite large (multiple GB).

We do have a hardlink-based snapshot. How would we run multiple borg processes and ensure that they are not cannibalizing each other and also that we end up with a consistent backup?

Jun 26 '22 13:06 Slind14

borg manages caching, indexes and locking based on the repo id (which is unique and random). so you can run borg on the same machine, as the same user, at the same time IF you use different repos.

so you could partition your input data set and give each part to another borg.

Jun 26 '22 13:06 ThomasWaldmann

also wondering why a not-first backup takes that long. does the dedup not work or is it really lots of NEW data?

Jun 26 '22 13:06 ThomasWaldmann

also wondering why a not-first backup takes that long. does the dedup not work or is it really lots of NEW data?

There is more new data than 100MBit/s can do.

Jun 26 '22 13:06 Slind14

borg manages caching, indexes and locking based on the repo id (which is unique and random). so you can run borg on the same machine, as the same user, at the same time IF you use different repos.

so you could partition your input data set and give each part to another borg.

Unfortunately, partitioning is not possible with the way the data is stored. 90% is under the same directory and then goes into around one million files.

Jun 26 '22 13:06 Slind14

ok.

iirc there is some --upload-buffer (or so) option, maybe you can try using that to speed it up.

you use some fast compression (default is lz4, zstd,1 .. zstd,3 would also work i guess)?

Jun 26 '22 13:06 ThomasWaldmann

another idea is not to use different repo for partitions of the data, but for different times.

not pretty, but would work: use a different repo depending on weekday.

Jun 26 '22 13:06 ThomasWaldmann

iirc there is some --upload-buffer (or so) option, maybe you can try using that to speed it up.

the data is already compressed, hence we don't use any

Are there any plans to support multi-connection uploads? Would it be a major change or something simple?

Jun 26 '22 14:06 Slind14

another idea is not to use different repo for partitions of the data, but for different times.

the majority of the new data is from the last 24 hours :( these are in the same place - not really possible to be split.

Jun 26 '22 14:06 Slind14

--upload-buffer is about buffering, not compression.

Jun 26 '22 14:06 ThomasWaldmann

--upload-buffer is about buffering, not compression.

Sorry I quoted the wrong line. ;)

Jun 26 '22 14:06 Slind14

Unfortunately changing the buffer does not help.

Restic added parallel uploads not too long ago, if borg had something similar it would be great.

https://github.com/restic/restic/pull/3593 https://github.com/restic/restic/pull/3513

Jun 26 '22 21:06 Slind14

with the current backend structure multi connection upload are not sensibly possible as the log structured store is not concurrent and the encryption scheme is also not yet prepared for such a scenario

i would imagine that a major refactor would be necessary to support them

Jun 27 '22 05:06 RonnyPfannschmidt

I see, thank you.

Jun 27 '22 09:06 Slind14

borg borg copied to clipboard

Multiple Connections/Streams

borg
borg copied to clipboard