czkawka icon indicating copy to clipboard operation
czkawka copied to clipboard

Progress bar misleading due to 5% of the files taking 95% of the hashing time (switch to MB/s?)

Open Fifis opened this issue 3 years ago • 1 comments

The hashing progress does not show the real progress and is not informative for estimating the completion time.

Most file systems are described rather well by a log-normal file size distribution, so the progress becomes (up to some non-degenerate transformation) exponentially slower due to the heavy tails. This is not how UI should work. E.g. when disks are copied, the read-write progress is always shown in MB, not files. Of course with many files, the actual MB/s drops due to the extra attribute- and alignment-related fluff, but this is not a problem for hashing small files at the beginning. It is the large waiting times at the end that the user is not expecting when the last 5% files are taking 40x more time (measured on my computer) than the first 95% files.

Two possible solutions:

  1. To save computational effort, once small files (<1 MB) have been processed, compute the time it takes to fully hash X MB based on 10 last files larger than 1 MB (to reduce the effect of the overhead), and compute the estimate for the remaining Y MB based on this MB/s estimate. Make the main progress bar show the completeness in terms of MB, not files.
  2. Write the base name and the file size of the current file under the progress bar. This way, the user will know that it is not frozen—just being processed (relevant for large drives with many large files, e.g. HDDs). At least they could see which file is taking so long. This is related to the issue #635 where the hashing could not be interrupted, and the user would not have the foggiest how long it would take for that one file to be over.

Obligatory XKCD.

Fifis avatar Feb 23 '22 14:02 Fifis

Maybe there is another solution: The progress bar currently shows "completedNumberOfFiles/totalFileNumber". What about a second progress bar underneath with "sumOfCompletedFileSizes/sumOfAllFilesSize". That way you wouldn't have to compute a MB/s value but the user can see that while nearly all of the files are done, maybe just half of the file size is done.

llvs avatar Jan 18 '24 11:01 llvs