rustic_core icon indicating copy to clipboard operation
rustic_core copied to clipboard

apparent blocking when uploading via opendal

Open talss89 opened this issue 1 year ago • 1 comments

Hello :wave:

rustic_core 0.2.0

I'm noticing a (perhaps) unusual behaviour when taking a snapshot and pushing it over opendal:s3.

The backup progressbar seems to increment, then pause, then increment ... until complete.

When backing up locally, the progress bar moves constantly, which leaves me to suspect either:

  • a) Rustic builds packs, counts progress, then uploads in a blocking fashion
  • b) A transmit buffer is being saturated, throttling the entire process.

I haven't had the opportunity to dive in to the implementation, but I was wondering if it would be possible to discuss in the case of:

  • a) Running build and transmit in separate threads, and incrementing progress bytes when tx has completed, or;
  • b) Customising the size of the transmit buffer

Could anybody shed some light as to what is going on, and how we might come up with a more performant solution?

My apologies if this is the wrong place to discuss this, and if my understanding is way off.

Many thanks to everyone who is working on this wonderful library.

talss89 avatar Feb 09 '24 18:02 talss89

Thanks @talss89 for opening this issue and sorry for the very late reply :-/

When backing up locally, the progress bar moves constantly, which leaves me to suspect either:

* a) Rustic builds packs, counts progress, then uploads in a blocking fashion

* b) A transmit buffer is being saturated, throttling the entire process.

The progress bar actually does increase after the data has been chunked and successfully inserted in the queue which then compresses (if set), encrypts and packs it into a data pack file in-memory. This in-memory pack file is then transferred to the backend. There are some channels installed which transfer the data to the next step but which also block if the data cannot be processed fast enough.

  • So, the progress bar is actually not fully representing how much data has been completely processed. This should be corrected.
  • It could be that we are also not perfectly efficient here. But on the other hand if the bottleneck is uploading, then at some time (independently from the used queue sizes), every queue is filled and the progress only continues with the rate the backend can be filled with pack files..
  • We are having some vague ideas how to optimize the processing (which currently also is quite complicated which is not good), maybe using some kind of actor model throughout the whole process. But first we want to implement high-level integration tests so we can be sure that we won't introduce any regression when doing this.

I haven't had the opportunity to dive in to the implementation, but I was wondering if it would be possible to discuss in the case of:

* a) Running build and transmit in separate threads, and incrementing progress bytes when tx has completed, or;

This is already separated. What could be improved is the location of the byte increment. But: If we use large pack sizes and only increment once the data has been transferred, we will always see the mentioned behavior. Better would be to have multiple progresses: Read data / processed data (i.e. chunked/.../until packed in-memory) / transferred data. But I didn't get multi-progress working with indicatif, so far...

* b) Customising the size of the transmit buffer

This is currently also not possible. But I think the way to go is to customize queue lengths.

aawsome avatar Mar 09 '24 20:03 aawsome