dupeguru icon indicating copy to clipboard operation
dupeguru copied to clipboard

DupeGuru thinks that a newer copy of a file is the original

Open android441user opened this issue 8 years ago • 4 comments

Dear Virgil,

first of all, a big thank-you for releasing this valuable piece of software into the sphere of freedom!

I have recently instigated that DupeGuru become part of the MX-16 project. MX-16 is an improved Debian not relying heavily on systemd, and it is becoming more and more popular. One of the project's repo men thankworthily built a package from your 4.0.3 source code for the MX-Test Repo, and I'm presently testing it. You can see the discussion on it here: https://forum.mxlinux.org/viewtopic.php?f=121&t=41980.

However, upon testing DupeGuru, it seems to me that it has a very severe bug enthreating its very function. I'm not sure if it really is a bug or rather a configuration issue, or some strange kind of behaviour that is related to MX-16 (and MX-16's dealing with timestamps).

What I did was this:

I did a test run on a friend's historically grown data.

My friend has a folder on his HDD which he once created when he had to travel abroad. The pretext is: He copied one folder (let's call it "the original folder") to another folder before he travelled abroad, and then he copied that folder to a USB stick (which he took with him) before travelling in order to not loose his data in case the device he left at home gets stolen.

So my friend has one prominent duplicate folder on his HDD with duplicate files in it. The original folder in question was created on March 8th, 2017. The copy of this folder was created approximately two hours later (at least that's what nemo tells me).

So you should think that DupeGuru would regard an original file in the original folder as being the original, and prefer to save the original and to delete the duplicate copy file (otherwise the application you made probably wouldn't be called "DupeGuru"). However, what DupeGuru actually does it is to offer the original for deletion, while not offering the (newer) copy to be deleted (you can't check it for deletion in the GUI).

Is this a bug, or a configuration issue, or an MX-16 issue?

As you can see in the discussion, one of the MX-16 developers guesses that DupeGuru maybe cannot read timestamps, whereas I couldn't quite believe that.

Greetings, Joe

android441user avatar Apr 30 '17 15:04 android441user

Default prioritization rules are at https://github.com/hsoft/dupeguru/blob/master/core/scanner.py#L186 and https://github.com/hsoft/dupeguru/blob/master/core/scanner.py#L106 and currently don't include mtime. that can probably be improved.

In any case, however, it's always possible to adjust prioritization post-scan: https://www.hardcoded.net/dupeguru/help/en/faq.html#the-mark-box-of-a-file-i-want-to-delete-is-disabled-what-must-i-do

ghost avatar Apr 30 '17 16:04 ghost

Dear Virgil,

thanks a lot for your quick reply and for the explanation.

Manually readjusting prioritization wouldn't work in my friend's case, because there are just too many files. It would take weeks to click-change the priority for each and every one of them.

It would be wonderful if you could enclose a new prioritization rule considering mtime, and an easy way of applying it from the very start.

I would happily ask MX-16's repo men for the packaging of a version enclosing this, so that they can enclose that DupeGuru version in the MX-Test Repo (and in the regular repo, too, once it has been thoroughly tested by the MX-16 community).

Greetings, and thanks for your great work,

Joe

android441user avatar May 01 '17 15:05 android441user

If you want to prioritize through mtime, you don't have to do it one by one! There are post-scan re-prioritization facilities. See: https://www.hardcoded.net/dupeguru/help/en/faq.html#i-want-to-make-my-latest-modified-files-reference-files-what-can-i-do

ghost avatar May 01 '17 15:05 ghost

Dearl Virgil,

thanks so much for the hint. We never saw that one because my friend only looked in the German FAQ, where this is not being discussed.

I guess your hint brings us closer to a solution. My question would be, however, what my friend can do if he wants that the file with the oldest mtime is regarded as the reference, while only looking for duplicates that – apart from the mtime – are identical in name, filesize and content.

I'm sorry I can't help my friend all on myself here. I suppose this is because I still didn't fully understand all of the logic behind DupeGuru. If possible, could you please maybe add a simple kind of standard workflow diagram to the FAQ, and write one paragraph in the FAQ defining what "Delta values" exactly are?

Sorry I have to bother you with something that's probably completely evident to you, but maybe this might reduce the number of support requests in general. I read around for hours but I still don't completely get it.

Greetings, and have a nice weekend, Joe

android441user avatar May 05 '17 10:05 android441user