fslint icon indicating copy to clipboard operation
fslint copied to clipboard

(suggestion) Progress bar for duplicate file checks

Open pixelb opened this issue 9 years ago • 13 comments

Original issue 17 created by pixelb on 2009-01-12T15:47:00.000Z:

A progress bar of sorts giving an indication how long a search will take would be a nice improvement, especially when searching large partitions with many duplicates spread all over the place ;)

pixelb avatar Mar 12 '15 18:03 pixelb

Comment #1 originally posted by pixelb on 2009-01-12T15:58:04.000Z:

I can't see how to figure out how much processing is required ahead of time?

pixelb avatar Mar 12 '15 18:03 pixelb

Comment #2 originally posted by pixelb on 2009-01-12T16:25:11.000Z:

I guess approximate and break it into stages.

The first 10% attributed to getting the number of files to process. I.E. processing the file meta data, including filtering out files with a unique file size.

Then I could increment from 10 - 80% with feedback from md5sum processing.

Then the last 10% for other stuff.

The feedback from each stage though would be quite invasive, so not an easy feature to implement at all. I'll think about it.

pixelb avatar Mar 12 '15 18:03 pixelb

Comment #3 originally posted by pixelb on 2009-03-06T01:43:05.000Z:

I think what would be nice is feedback of any kind.

What files it is touching, and the tasks it is performing.

pixelb avatar Mar 12 '15 18:03 pixelb

Comment #4 originally posted by pixelb on 2010-10-05T06:41:26.000Z:

definitely. it's driving me crazy just sitting there for 4+ hours with no indication of how much progress it has made or how much work is left. i don't know whether to go to sleep and check it in the morning or just stay up another hour and wait for it...

pixelb avatar Mar 12 '15 18:03 pixelb

Comment #5 originally posted by pixelb on 2011-07-06T07:01:27.000Z:

Hello, I have installed FSlint and although its good, I am having it scan about 5 harddrives. It's taken an entire day and I have no indication how soon it will finish. I would suggest the following scheme to show progress. I assume FSlint proceeds in phases. From your docs I'm assuming internally your algorithm works like this:

  1. Phase 1: Get all files, get their file sizes.
  2. Phase 2: For all files with the same sizes, calculate md5sum on those files.

The output of duplicates shows what files have the same md5 / sha1.

For phase 1, I would love to see a progress bar which shows total files to be scanned & the total files completed scanning. I am not sure if there is any problem in getting the total number of files ahead of time (you already know what directories you're scanning).

For phase 2, You already know how many files have the same sizes. You can show the number of files whose md5s you've calculated & the number of files remaining.

Let me know if you like these suggestions.

pixelb avatar Mar 12 '15 18:03 pixelb

Comment #6 originally posted by pixelb on 2011-07-06T08:39:22.000Z:

Phase 1: You couldn't get the number of files to scan ahead of time, so this would need to just be a "scanning..." status. The good thing is that would be relatively short.

Phase 2: Yes this stage we know the number of files to check. It would be good to provide feedback on progress here.

pixelb avatar Mar 12 '15 18:03 pixelb

Comment #7 originally posted by pixelb on 2011-08-03T07:54:00.000Z:

I regularly scan my large data set, 10+ TB and would LOVE to see progress bars.

I would simply take a sum of the file sizes needed to scan and adjust it for what files you have already scanned. If all the files sum to 10000 bytes and you have scanned two files that add up to 5000 bytes you have 50%. You take the average time it takes to scan X bytes and estimate the remaining time at that speed.

I regularly do this for things on the command line such as doing a "date; df -h" and several minutes later repeating. Take the size difference / time difference and multiply it out and I get a rough idea. Not looking for microsecond accuracy but sometimes I've let it run all weekend with not a clue as to when it will be done, or how many files/size it's got left.

pixelb avatar Mar 12 '15 18:03 pixelb

OMG this feature is needed. Yes, it's impossible to calculate from the start... but if you collect a file list first, and then get a access speed average, you can calculate some flying-average approximation of ETA, or at least a % file completed. That is, assuming it's done that way :)

george-viaud avatar Oct 21 '16 00:10 george-viaud

I hope this can be added.

claell avatar Jan 01 '17 13:01 claell

I'd also like to throw in support for this feature. I fiddled for a while with the CLI tools before I realized I'd just have to wait.

vidia avatar Feb 24 '17 05:02 vidia

I also feel this would be a nice feature but it may be impossible as there are so many variables involved.

Would it be possible to have a count of files and a percentage of those complete.

At least a status of files or directories being scanned in an abbreviated format on the screen. It would be an indication that there is activity occurring in the process.

RWL-69 avatar Apr 20 '17 04:04 RWL-69

@RWL-69 Yes, I agree. An intermediate and similar feature would be the option to display how many files had been processed by the various phases of the fslint algorithm.

snow-abstraction avatar Aug 04 '17 09:08 snow-abstraction

Perhaps as an alternative, a verbose output console-like stream, or changing stats that can indicate some activity? Granted I'm suggesting this only to show that the process is still active, but it may provide the reassurance some of us seek.

hpka avatar Dec 12 '17 15:12 hpka