duplicacy
duplicacy copied to clipboard
Upload speed is wrong when chunks are skipped
When resuming an incomplete backup, Duplicacy takes into account the chunks that it skips as part of the uploaded data, and uses them to calculate upload speed, which results in a wrong speed and ETA information.
I would excpect Duplicacy to only take into account actually uploaded chunks.

The problem is, you'll never know how many chunks need to be uploaded. It is either too optimistic or too pessimistic. The current stats is too optimistic, but if most chunks indeed already exist on the storage then the number based on actually uploaded chunks will be off by a lot.
You are correct, it is not possible to know how many chunks are left to be uploaded, but you do know how many you didn't have to upload, and the more real data you use, the better.
Right now you are actually using data that you know for a fact is not true, while if you just ignore the skipped chunks, yes, you still don't know how many will be eventually skipped, but you can just ignore every skipped chunk for the purpose of calculating average speed, and leave the info blank when a chunk is skipped.
Something like this:
Skipped chunk 3138 size 2182974, --Kb/s --:--:-- 81.8%
Skipped chunk 3139 size 1297990, --Kb/s --:--:-- 81.8%
Skipped chunk 3140 size 3233159, --Kb/s --:--:-- 81.9%
Uploaded chunk 3141 size 6226667, 511Kb/s 00:01:45 81.9%
Uploaded chunk 3142 size 6928052, 511Kb/s 00:01:40 81.9%
Uploaded chunk 3143 size 9398697, 511Kb/s 00:01:35 82.0%
Uploaded chunk 3144 size 1685966, 511Kb/s 00:01:30 82.0%
Uploaded chunk 3145 size 6351912, 511Kb/s 00:01:25 82.0%
Uploaded chunk 3146 size 1645265, 511Kb/s 00:01:20 82.0%
Skipped chunk 3147 size 3233159, --Kb/s --:--:-- 82.1%
Skipped chunk 3148 size 3233159, --Kb/s --:--:-- 82.1%
Uploaded chunk 3149 size 1645265, 511Kb/s 00:01:05 82.1%
Uploaded chunk 3150 size 1645265, 511Kb/s 00:01:00 82.2%
The optimistic number is interesting, but a "real" number is useful. Resuming an interrupted rsync --partial --progress
does the same thing, but it uses a small window so the real speed reflects quickly once it gets past the already transferred part.
I suggest just showing both numbers and call it "in/out" or "processed/uploaded" or "read/write". Then you can see how efficiently chunks are skipped and also how fast new chunks are uploaded. This will work best if the calculation window isn't the entire run time - an overall summary at the end is fine to show the total in/out speeds and a 1 minute average for when you stare at the log.
I'll vote for the small window approach.
+1 for a small window approach too.
In terms of calculating and reporting the speed to the screen would it make sense to have its displayed as Read Speed: 120MB/s, Write Speed: 3MB/s
. The Read Speed ascertained by the reading/hashing that always has to happen, and the Write Speed dictated and recalculated when each Upload Occurs. So that that only actual Uploads affect that number and its an average over time. Using a small window would definitely help make the result more current.
Also, it's worth mentioning here that -stats
could do with being added to the copy
command, #271.
I just ran a first time backup on a repository, and it skipped a few chunks in between. Could anyone tell the reason why are chunks being skipped?
A third option would be to remove skipped file sizes from the total file size instead of adding it to the done file size. In case that's easier to implement.
@TheBestPessimist Thank you for your attempt for the small window, but sadly it was closed and not merged… what happened there?