fdupes
fdupes copied to clipboard
fdupes: option to sort by size
From @sandrotosi on December 20, 2015 14:4
From matrixhasu on October 08, 2009 21:58:52
Debian bug #383962
- http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=383962 I would like an option to sort the list of duplicates by file size (both
ascending and descending). This would be especially useful for the
interactive mode, but it might also be useful for the listing mode.
Original issue: http://code.google.com/p/fdupes/issues/detail?id=3
Copied from original issue: sandrotosi/fdupes-issues#3
From [email protected] on February 04, 2014 17:35:41
This doesn't even have to be a hard feature. It could be implemented easily with a small change to the output format, nothing more.
For example, here's some fdupes output with --size
:
648 bytes each:
./2014-01-15/javascript/jshomepage.js
./2013-12-26/javascript/jshomepage.js
28951 bytes each:
./2014-01-15/javascript/jsencryption.js
./2013-12-26/javascript/jsencryption.js
3014 bytes each:
./2014-01-15/javascript/jsrentblackbox.js
./2013-12-26/javascript/jsrentblackbox.js
This could be parsed and sorted by another script, but with difficulty. At least, I don't see any obvious one-liner shell pipeline which could do it.
However! If the newlines are deleted, so that the output instead looks like this:
648 bytes each: ./2014-01-15/javascript/jshomepage.js ./2013-12-26/javascript/jshomepage.js
28951 bytes each: ./2014-01-15/javascript/jsencryption.js ./2013-12-26/javascript/jsencryption.js
3014 bytes each: ./2014-01-15/javascript/jsrentblackbox.js ./2013-12-26/javascript/jsrentblackbox.js
then there is immediately an easy shell pipeline: fdupes dir/ --size | sort --general-numeric-sort
:
648 bytes each: ./2014-01-15/javascript/jshomepage.js ./2013-12-26/javascript/jshomepage.js
3014 bytes each: ./2014-01-15/javascript/jsrentblackbox.js ./2013-12-26/javascript/jsrentblackbox.js
28951 bytes each: ./2014-01-15/javascript/jsencryption.js ./2013-12-26/javascript/jsencryption.js
Don't like the preceding whitespace? Toss in a `| uniq' to squeeze the blank lines.
(Of course, once you've made it this far, it might occur to you that one could delete the newlines, but it's not obvious how to group each set of files without some sort of context... At least, I can't figure out a reasonable sed invocation, so while it's doable somehow, most users certainly can't figure it out.)
I also wanted this feature and implemented it within the fdupes c-code using and adapted mergesort-algorithm for linked lists from geeksforgeeks because in fdupes.c the files are stored in a linked list. I guess if it's wanted then I can beautify it, add a command line option for it and create a pull request. Anyone interested?
I just went ahead and created a pull request.
Wow! This can be super usefull if you need to quickly free some space by deleting the biggest of duplicates.
BTW if you need some workaround... This should work unless your filenames contain string "B@R@E@A@K"
fdupes -rnS . | sed -e 's/^$/B@R@E@A@K/g' | tr '\n' '\0' | sed -e 's/B@R@E@A@K\x00/\n/g' | sort -rn | tr '\0' '\n' | tee fdupes.txt
What happened? nothing ? :(
Hi, is there a way to do this today, or is it still being considered ?
There's currently no way to do this. For those like Tomas who are looking to delete only files above a certain size, the new --minsize option may prove useful.
On Thu, Jan 9, 2020 at 4:07 PM Node Central [email protected] wrote:
Hi, is there a way to do this today, or is it still being considered ?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/adrianlopezroche/fdupes/issues/44?email_source=notifications&email_token=ABPQT7KGIKTEIOGPEOUJKILQ457YZA5CNFSM4BXDKPS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIRTM3Q#issuecomment-572733038, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPQT7POCJ2ZBJK5OQXWA5TQ457YZANCNFSM4BXDKPSQ .