Files icon indicating copy to clipboard operation
Files copied to clipboard

Feature: Add support for finding duplicates

Open Worgle123 opened this issue 2 years ago • 13 comments

The Idea:

I have pretty massive numbers of files, and I have been constantly frustrated about the default duplicate finder, which only appears to find duplicate names. I would love to see this feature find it's way to Files, as at the moment, I have to use paid clunky software, which doesn't even appear to do a great job. Something that had a look at file contents or even just size would be amazing!

How I should work.

The feature should have several layers of scanning. It could be initiated either automatically if certain parameters (like name, or instead, name and then size) were activated, or within the right click menu of a folder/disk, or maybe within the right click menu if a user had selected 1 or more files (this would mean it would only scan for duplicates of the selected files).

It could start by finding all files with a not just identical, but similar name, and then also be set up to do a more resource intensive scan of file sizes find duplicates. It would only progress to scanning file sizes after it had found the ones with similar names. It would only scan the ones with similar names, and could have adjustable sensitivity to reduce it's performance impact. Maybe a system could automatically set this to a recommended sensitivity depending on file types. To give an example, when searching only through images, you generally have a decent sized difference between images, so sensitivity could be turned down. When looking through text documents though, they tend to have a relatively consistent file size, so close to maximum sensitivity would be needed. It may (if it is not to hard) be able to pick a recommended average score if there is some difference in file types. Even if it would not work if there was a large difference, it would help with the process if, say, there were just 1 or 2 rogue files in a scan. This would mean the system would not break just for the sake of 1 or 2 files.

It would also have to be able to judge whether they are in the same format or not, as otherwise there could be the accidental clash of (for an example: CR3 and .JPEG files).

If it found any matches, it could then progress to scanning the actual contents of the file, or (probably better) just opening it in a separate window for the user to judge the similarity. After the scan was completed, it would then open a grid/list area of all located duplicates, (where you could check/uncheck suspected duplicates) and ask the user for a next step (eg. delete/rename/move to a different location).

In a Nutshell

Will add an advanced duplicate finder.

Files Version

2.0.31.0

Windows Version

Windows 11, version 22H2

Comments

Could be both auto (when a file with the same name is added to a folder) and have an option to scan specific file areas.

Worgle123 avatar Feb 21 '23 02:02 Worgle123