hydrus icon indicating copy to clipboard operation
hydrus copied to clipboard

Show exact file size in dupe filter comparison

Open roachcord3 opened this issue 2 years ago • 2 comments

When it comes to comparing two files of nearly identical size, especially if they are pixel-for-pixel duplicates, and all else being equal, users will generally have a preference to keep the smaller file. The "roughly equal" file size comparator is an active hindrance in this case, because it forces the user to open the file in an external program to get the exact size, or to change the sig figs in their general file size display preferences, which makes a bunch of other UIs far less usable.

To resolve this, I propose that in the dupe filter's file size comparison, the actual size in bytes should be displayed in addition to the user-friendly file size (ToHumanBytes.) Furthermore, the approximately equal case () should be removed, to make sure that even if it's only a byte's difference, it's still very easy to tell.

I understand that you do not normally take pull requests, but in my case I have been using a patch on my client for several months now, so I think sharing this will give you an idea of exactly what I have in mind. Of course, I won't be offended if you don't use it; this is literally just to make sure my request is clearer.

diff --git a/hydrus/client/media/ClientMedia.py b/hydrus/client/media/ClientMedia.py
index 10259bf8..ebe6050d 100644
--- a/hydrus/client/media/ClientMedia.py
+++ b/hydrus/client/media/ClientMedia.py
@@ -173,7 +173,7 @@ def GetDuplicateComparisonStatements( shown_media, comparison_media ):
                 score = -duplicate_comparison_score_much_higher_filesize


-        elif absolute_size_ratio > 1.05:
+        else:

             if s_size > c_size:

@@ -186,18 +186,18 @@ def GetDuplicateComparisonStatements( shown_media, comparison_media ):
                 score = -duplicate_comparison_score_higher_filesize


-        else:
-
-            operator = CC.UNICODE_ALMOST_EQUAL_TO
-            score = 0
-
-
         if is_a_pixel_dupe:

             score = 0


-        statement = '{} {} {}'.format( HydrusData.ToHumanBytes( s_size ), operator, HydrusData.ToHumanBytes( c_size ) )
+        statement = '{} ({:,}B) {} {} ({:,}B)'.format(
+                HydrusData.ToHumanBytes( s_size ),
+                s_size,
+                operator,
+                HydrusData.ToHumanBytes( c_size ),
+                c_size,
+                )

         statements_and_scores[ 'filesize' ]  = ( statement, score )

roachcord3 avatar Jun 25 '22 23:06 roachcord3

users will generally have a preference to keep the smaller file

Thats a misperception. Smaller files are more often "the wrong" files, files which have been cleared of their metadata by for example beeing posted on discord. These files usually are not the original files and their hashes do not match with PTR. So keep that in mind before declaring all smaller files are bad. The only good smaller files are the hoffman optimized ones that are also tagged on ptr.

TheElo avatar Jun 28 '22 07:06 TheElo

@TheElo you would be able to tell if that's the case depending on the number of tags on the two files, which is of course another aspect shown in the dupe filter that can be given its own weight. Keep in mind that various art posting sites will add metadata that the original image didn't have, either. I thought about adding "all else being equal" to my original post, but decided not to. Since it seems to have confused you, I will edit it so that it is clear.

Besides, if it's your prerogative to prefer the larger file, then this feature request will help you too. It shows the exact file sizes. No more not knowing which one is bigger for you to waste disk space on, now you'll be able to immediately tell.

roachcord3 avatar Jun 28 '22 08:06 roachcord3