fdupes icon indicating copy to clipboard operation
fdupes copied to clipboard

fdupes: option to sort by size

Open sandrotosi opened this issue 9 years ago • 8 comments

From @sandrotosi on December 20, 2015 14:4

From matrixhasu on October 08, 2009 21:58:52

Debian bug #383962 - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=383962 I would like an option to sort the list of duplicates by file size (both ascending and descending). This would be especially useful for the interactive mode, but it might also be useful for the listing mode.

Original issue: http://code.google.com/p/fdupes/issues/detail?id=3

Copied from original issue: sandrotosi/fdupes-issues#3

sandrotosi avatar Dec 20 '15 14:12 sandrotosi

From [email protected] on February 04, 2014 17:35:41

This doesn't even have to be a hard feature. It could be implemented easily with a small change to the output format, nothing more.

For example, here's some fdupes output with --size:

648 bytes each:
./2014-01-15/javascript/jshomepage.js
./2013-12-26/javascript/jshomepage.js

28951 bytes each:
./2014-01-15/javascript/jsencryption.js
./2013-12-26/javascript/jsencryption.js

3014 bytes each:
./2014-01-15/javascript/jsrentblackbox.js
./2013-12-26/javascript/jsrentblackbox.js

This could be parsed and sorted by another script, but with difficulty. At least, I don't see any obvious one-liner shell pipeline which could do it.

However! If the newlines are deleted, so that the output instead looks like this:

648 bytes each: ./2014-01-15/javascript/jshomepage.js ./2013-12-26/javascript/jshomepage.js

28951 bytes each: ./2014-01-15/javascript/jsencryption.js ./2013-12-26/javascript/jsencryption.js

3014 bytes each: ./2014-01-15/javascript/jsrentblackbox.js ./2013-12-26/javascript/jsrentblackbox.js

then there is immediately an easy shell pipeline: fdupes dir/ --size | sort --general-numeric-sort:

648 bytes each: ./2014-01-15/javascript/jshomepage.js ./2013-12-26/javascript/jshomepage.js
3014 bytes each: ./2014-01-15/javascript/jsrentblackbox.js ./2013-12-26/javascript/jsrentblackbox.js
28951 bytes each: ./2014-01-15/javascript/jsencryption.js ./2013-12-26/javascript/jsencryption.js

Don't like the preceding whitespace? Toss in a `| uniq' to squeeze the blank lines.

(Of course, once you've made it this far, it might occur to you that one could delete the newlines, but it's not obvious how to group each set of files without some sort of context... At least, I can't figure out a reasonable sed invocation, so while it's doable somehow, most users certainly can't figure it out.)

sandrotosi avatar Dec 20 '15 14:12 sandrotosi

I also wanted this feature and implemented it within the fdupes c-code using and adapted mergesort-algorithm for linked lists from geeksforgeeks because in fdupes.c the files are stored in a linked list. I guess if it's wanted then I can beautify it, add a command line option for it and create a pull request. Anyone interested?

malkuh avatar Feb 23 '16 01:02 malkuh

I just went ahead and created a pull request.

malkuh avatar Feb 23 '16 14:02 malkuh

Wow! This can be super usefull if you need to quickly free some space by deleting the biggest of duplicates.

Harvie avatar Feb 09 '17 02:02 Harvie

BTW if you need some workaround... This should work unless your filenames contain string "B@R@E@A@K"

fdupes -rnS . | sed -e 's/^$/B@R@E@A@K/g' | tr '\n' '\0' | sed -e 's/B@R@E@A@K\x00/\n/g' | sort -rn | tr '\0' '\n' | tee fdupes.txt

Harvie avatar Feb 09 '17 02:02 Harvie

What happened? nothing ? :(

IvanTurgenev avatar Feb 10 '17 21:02 IvanTurgenev

Hi, is there a way to do this today, or is it still being considered ?

nodecentral avatar Jan 09 '20 20:01 nodecentral

There's currently no way to do this. For those like Tomas who are looking to delete only files above a certain size, the new --minsize option may prove useful.

On Thu, Jan 9, 2020 at 4:07 PM Node Central [email protected] wrote:

Hi, is there a way to do this today, or is it still being considered ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/adrianlopezroche/fdupes/issues/44?email_source=notifications&email_token=ABPQT7KGIKTEIOGPEOUJKILQ457YZA5CNFSM4BXDKPS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIRTM3Q#issuecomment-572733038, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPQT7POCJ2ZBJK5OQXWA5TQ457YZANCNFSM4BXDKPSQ .

adrianlopezroche avatar Jan 09 '20 21:01 adrianlopezroche