jdupes icon indicating copy to clipboard operation
jdupes copied to clipboard

jdupes does not find some duplicates (windows)

Open AnselmD opened this issue 6 years ago • 21 comments

Hi,

i have a directory with lot of files and file hardlinks.

Lets call it k:\abc the folder names d e f g are only examples.

I do a jdupes -r -t k:\abc\d

which should find duplicates, but it does not.

I do a jdupes -r -t k:\abc\d\e\g k:\abc\d\f\g

and now duplicates are the result. I have no folder links.

For me this looks weird. Any hint what i do wrong?

I did some additional tests, the following also does not give duplicates:

jdupes -r -t k:\abc\d\e k:\abc\d\f jdupes -r -t -H k:\abc\d\e k:\abc\d\f

AnselmD avatar Jun 20 '19 09:06 AnselmD

Sorry, I don't seem to have gotten an email notification of this issue for some reason, so I'm only seeing it now.

What Windows version and edition are you running? I test mainly on 64-bit Windows 7 SP1 and Windows 10 builds 1607+. If you are on XP, Vista, or something equally old or unusual, that may be related to the issue. What kind of device is at K: and what filesystem does it use?

In the latest Windows releases, I've packaged an extra executable called jdupes-loud.exe which supports the -@ debugging parameter. Assuming your file paths are not sensitive information, could you run something like this:

jdupes-loud -@rtH k:\abc\d > log_fail.txt
jdupes-loud -@rtH k:\abc\d\e k:\abc\d\f > log_pass.txt
dir /s /a: k:\abc\d > log_files.txt

Then zip or 7z the log_XXXXX.txt files up and email them to me at [email protected]? Depending on how many files/folders there are, the output files may get extremely large, so anything you can do to reduce the number of files that must be scanned to produce pass/fail results will help me.

jbruchon avatar Jun 29 '19 21:06 jbruchon

H Jody, no problem.

64-bit Windows 7 SP1, k: is an external USB 3 HD, 2TB, NTFS, some folders are NTFS-Compressed, many files are already hard linked using AllDup.

At the moment I make a partial copy to another drive preserving the links. After this i am able to remove some folders to see, if the error still exists.

After this i will continue with jdupes-loud

Thank you for your reply, Anselm

AnselmD avatar Jun 30 '19 07:06 AnselmD

jdupes-loud -@rtH k:\abc\d > log_fail.txt jdupes-loud -@rtH k:\abc\d\e k:\abc\d\f > log_pass.txt

I think you like to redirect the error channel, so this works for me (2> instead of >): jdupes-loud -@rtH k:\abc\d 2> log_fail.txt

AnselmD avatar Jun 30 '19 08:06 AnselmD

You should redirect stderr to stdout to capture both: jdupes-loud -@rth k:\abc\d >log_fail.txt 2>&1

jbruchon avatar Jun 30 '19 13:06 jbruchon

I did not find a good hint in the log file.

I spent several hours to reduce the number of files. Moving folders outside and inside. I stopped this and copy the folder to remove the hardlinks.

The problem still exists, this means the Scannning process finished, but the next step (i think it is "Processing..." does not start.

So i renamed all folders names to number and all filenames to numbers.

Problem still exist. I replaced the contents of all files with one line containing 123, giving the extension .txt Processing still does not start.

Now i removed all files except 5.

Starting to remove empty folders but then the error is gone.

Test with hdd, sdd (both ntfs) windows 7 pro 64 bit and windows 7 pro 32 bit

c:jdupes.exe -rHt c:\jdupestest Scanning: 5 files, 101743 items (in 1 specified)

c:jdupes.exe -rt c:\jdupestest Scanning: 5 files, 101743 items (in 1 specified)

c:jdupes.exe -r c:\jdupestest Scanning: 5 files, 101743 items (in 1 specified)

I will created a 7z archive with the complete folder structure and will try to attach it here:

AnselmD avatar Jul 06 '19 10:07 AnselmD

jdupestest.zip

I am only allowed to upload zip, so inside is a 7z archive.

AnselmD avatar Jul 06 '19 11:07 AnselmD

There are no files in that folder structure to find as duplicates, unless I missed something. I need the debug logs to figure it out; they'll tell me what comparisons are done and what is skipped and why it was skipped.

jbruchon avatar Jul 06 '19 19:07 jbruchon

There are five duplicate files in it.

C:\jdupestest>dir /s /a *.txt Volume in Laufwerk C: hat keine Bezeichnung. Volumeseriennummer: CCAD-8D51

Verzeichnis von C:\jdupestest\113\9999__

05.07.2019 09:17 6 80002.txt 1 Datei(en), 6 Bytes

Verzeichnis von C:\jdupestest\156\29374

05.07.2019 09:17 6 934432.txt 1 Datei(en), 6 Bytes

Verzeichnis von C:\jdupestest\156\30495

05.07.2019 09:17 6 936926.txt 1 Datei(en), 6 Bytes

Verzeichnis von C:\jdupestest\233\48792

05.07.2019 09:18 6 80000.txt 1 Datei(en), 6 Bytes

Verzeichnis von C:\jdupestest\282\53141\77280_

05.07.2019 09:20 6 800004.txt 1 Datei(en), 6 Bytes

 Anzahl der angezeigten Dateien:
           5 Datei(en),             30 Bytes

I will create a log file.

AnselmD avatar Jul 06 '19 20:07 AnselmD

jdupes-loud.exe -@rt c:\jdupestest

jdupestest_log.zip

AnselmD avatar Jul 06 '19 21:07 AnselmD

OK, the log doesn't seem to indicate that the files were ever compared, but I did a fresh build and this is what I get with the provided data set:

C:\jdupestest>jdupes -nr .
Scanning: 0 files, 9365 dir^C(in 1 specified)
C:\jdupestest>jdupes -nrq .
.\113\9999__\80002.txt
.\156\29374\934432.txt
.\156\30495\936926.txt
.\233\48792\80000.txt
.\282\53141\77280_\800004.txt

C:\jdupestest>jdupes -nrHtq .
.\113\9999__\80002.txt
.\156\29374\934432.txt
.\156\30495\936926.txt
.\233\48792\80000.txt
.\282\53141\77280_\800004.txt

C:\jdupestest>jdupes -v
jdupes 1.13.1 (2019-06-10) 64-bit i32
Compile-time extensions: windows unicode noperm nosymlink
...

I am doing this on Windows 10 1903 64-bit, but Windows 7 should not be any different.

What version are you using? Did you grab a binary from here or did you build your own? jdupes -v

jbruchon avatar Jul 07 '19 15:07 jbruchon

I downloaded the binary from here.

jdupes.exe -v jdupes 1.13.1 (2019-06-10) 32-bit Compile-time extensions: windows unicode noperm nosymlink Copyright (C) 2015-2019 by Jody Bruchon

AnselmD avatar Jul 07 '19 15:07 AnselmD

Can you download the 64-bit binary instead and see if it works?

jbruchon avatar Jul 07 '19 15:07 jbruchon

I started with the 64 bit version at the other pc. I can try to do the test again.

AnselmD avatar Jul 07 '19 15:07 AnselmD

I made the test with 64 bit.

Jdupes finds the duplicates, if i have the jdupestest.7z inside of the c:\jdupestest folder.

Unpacking jdupestest.7z to a fresh folder and the "error" is there. Removing jdupestest.7z from the folder c:\jdupestest does not help.

weird!

AnselmD avatar Jul 07 '19 16:07 AnselmD

So now i have two folders which should be identical from the contents.

c:\jdupestest (jdupes does not find duplicates) c:\jdupestest.old (jdupes finds duplicates)

Now i am copying all folders inside of c:\jdupestest.old with explorer to the new folder c:\jdupestest.old2

c:\jdupestest.old2 (jdupes does not find duplicates)

AnselmD avatar Jul 07 '19 17:07 AnselmD

OK, here's what I'd like. Get me loud debug output logs, but get them by cding into the folder first, then check for the failure in that directory (jdupes -rq .), then do jdupes -@rq . >logX.txt 2>&1 where X is a number to differentiate between the logs. What I will do is run a diff -Nau over the two log files to find where things are falling apart, and possibly WHY things are falling apart. If you could shoot a zip of the two logs (no need to 7z them) I'll look over them.

jbruchon avatar Jul 07 '19 20:07 jbruchon

jdupeslogs.zip

What is the difference between traverse item right and left?

traverse item right '.\100\302' traverse item left '.\100\302'

AnselmD avatar Jul 08 '19 16:07 AnselmD

Traversal refers to the binary tree of files built during the first phase where dirs and files get counted up. The algorithm is supposed to perform a depth-first traversal with each file entry along the way as the "first" file in a match check. From there, every file AFTER the "first" one in the tree is the "second" file which is checked against that one for a match. If there is a match, the match is registered against the first file's entry in the file tree, and the search continues. It goes until every file in the tree has been checked against every other file in the tree. A "traverse left" means to follow the left node of the current entry in the tree while a "traverse right" follows the right node.

For reference, an extremely simplistic file tree looks roughly like this:

     O
    / \
  O    O
 / \
O   O

Since the scan runs top-to-bottom and then left-to-right, traversal generally goes like this: if a left node exists, follow it; else if a right node exists, follow it; else process the current node. Once all L/R nodes under a node are traversed and processed, their parent node is processed on the way back up the tree, and the parent node is an L or R node for some other node unless it's the root node at the very top, so this process continues until the root node is the only one left unprocessed.

If you are getting a right traversal instead of a left traversal and that's the difference between a match and no match, jdupes is not traversing properly. I've suspected that the core algorithm had some traversal issues for a while and this would only confirm my suspicions. (I did not write the core algorithm, it was inherited from fdupes.) I am rewriting jdupes from scratch because the core algorithm is not ideal anyway; in particular, because matching work is handled on a "match pair" basis and exclusions are done on pairs but not entire match sets, there are problems that arise where a pair is explicitly no-matched but then the no-match pair files both match a third file and get chained into a combined match set indirectly because of the third file.

jbruchon avatar Jul 08 '19 17:07 jbruchon

Ah, okay, it is a binary tree build by the program.

AnselmD avatar Jul 13 '19 08:07 AnselmD

Hi Jody, do you have an idea, when you have your first shot of your rewritten jdupes?

AnselmD avatar Sep 20 '19 14:09 AnselmD

My time to work on it is increasingly limited. I don't know when I'll be able to put out a 2.0

jbruchon avatar Sep 20 '19 14:09 jbruchon

I know this is a very old issue. If it's still an issue then let me know. I've made a ton of changes since 2019.

jbruchon avatar Jun 12 '23 00:06 jbruchon