jdupes does not find some duplicates (windows)
Hi,
i have a directory with lot of files and file hardlinks.
Lets call it k:\abc the folder names d e f g are only examples.
I do a jdupes -r -t k:\abc\d
which should find duplicates, but it does not.
I do a jdupes -r -t k:\abc\d\e\g k:\abc\d\f\g
and now duplicates are the result. I have no folder links.
For me this looks weird. Any hint what i do wrong?
I did some additional tests, the following also does not give duplicates:
jdupes -r -t k:\abc\d\e k:\abc\d\f jdupes -r -t -H k:\abc\d\e k:\abc\d\f
Sorry, I don't seem to have gotten an email notification of this issue for some reason, so I'm only seeing it now.
What Windows version and edition are you running? I test mainly on 64-bit Windows 7 SP1 and Windows 10 builds 1607+. If you are on XP, Vista, or something equally old or unusual, that may be related to the issue. What kind of device is at K: and what filesystem does it use?
In the latest Windows releases, I've packaged an extra executable called jdupes-loud.exe which supports the -@ debugging parameter. Assuming your file paths are not sensitive information, could you run something like this:
jdupes-loud -@rtH k:\abc\d > log_fail.txt
jdupes-loud -@rtH k:\abc\d\e k:\abc\d\f > log_pass.txt
dir /s /a: k:\abc\d > log_files.txt
Then zip or 7z the log_XXXXX.txt files up and email them to me at [email protected]? Depending on how many files/folders there are, the output files may get extremely large, so anything you can do to reduce the number of files that must be scanned to produce pass/fail results will help me.
H Jody, no problem.
64-bit Windows 7 SP1, k: is an external USB 3 HD, 2TB, NTFS, some folders are NTFS-Compressed, many files are already hard linked using AllDup.
At the moment I make a partial copy to another drive preserving the links. After this i am able to remove some folders to see, if the error still exists.
After this i will continue with jdupes-loud
Thank you for your reply, Anselm
jdupes-loud -@rtH k:\abc\d > log_fail.txt jdupes-loud -@rtH k:\abc\d\e k:\abc\d\f > log_pass.txt
I think you like to redirect the error channel, so this works for me (2> instead of >): jdupes-loud -@rtH k:\abc\d 2> log_fail.txt
You should redirect stderr to stdout to capture both: jdupes-loud -@rth k:\abc\d >log_fail.txt 2>&1
I did not find a good hint in the log file.
I spent several hours to reduce the number of files. Moving folders outside and inside. I stopped this and copy the folder to remove the hardlinks.
The problem still exists, this means the Scannning process finished, but the next step (i think it is "Processing..." does not start.
So i renamed all folders names to number and all filenames to numbers.
Problem still exist. I replaced the contents of all files with one line containing 123, giving the extension .txt Processing still does not start.
Now i removed all files except 5.
Starting to remove empty folders but then the error is gone.
Test with hdd, sdd (both ntfs) windows 7 pro 64 bit and windows 7 pro 32 bit
c:jdupes.exe -rHt c:\jdupestest Scanning: 5 files, 101743 items (in 1 specified)
c:jdupes.exe -rt c:\jdupestest Scanning: 5 files, 101743 items (in 1 specified)
c:jdupes.exe -r c:\jdupestest Scanning: 5 files, 101743 items (in 1 specified)
I will created a 7z archive with the complete folder structure and will try to attach it here:
There are no files in that folder structure to find as duplicates, unless I missed something. I need the debug logs to figure it out; they'll tell me what comparisons are done and what is skipped and why it was skipped.
There are five duplicate files in it.
C:\jdupestest>dir /s /a *.txt Volume in Laufwerk C: hat keine Bezeichnung. Volumeseriennummer: CCAD-8D51
Verzeichnis von C:\jdupestest\113\9999__
05.07.2019 09:17 6 80002.txt 1 Datei(en), 6 Bytes
Verzeichnis von C:\jdupestest\156\29374
05.07.2019 09:17 6 934432.txt 1 Datei(en), 6 Bytes
Verzeichnis von C:\jdupestest\156\30495
05.07.2019 09:17 6 936926.txt 1 Datei(en), 6 Bytes
Verzeichnis von C:\jdupestest\233\48792
05.07.2019 09:18 6 80000.txt 1 Datei(en), 6 Bytes
Verzeichnis von C:\jdupestest\282\53141\77280_
05.07.2019 09:20 6 800004.txt 1 Datei(en), 6 Bytes
Anzahl der angezeigten Dateien:
5 Datei(en), 30 Bytes
I will create a log file.
OK, the log doesn't seem to indicate that the files were ever compared, but I did a fresh build and this is what I get with the provided data set:
C:\jdupestest>jdupes -nr .
Scanning: 0 files, 9365 dir^C(in 1 specified)
C:\jdupestest>jdupes -nrq .
.\113\9999__\80002.txt
.\156\29374\934432.txt
.\156\30495\936926.txt
.\233\48792\80000.txt
.\282\53141\77280_\800004.txt
C:\jdupestest>jdupes -nrHtq .
.\113\9999__\80002.txt
.\156\29374\934432.txt
.\156\30495\936926.txt
.\233\48792\80000.txt
.\282\53141\77280_\800004.txt
C:\jdupestest>jdupes -v
jdupes 1.13.1 (2019-06-10) 64-bit i32
Compile-time extensions: windows unicode noperm nosymlink
...
I am doing this on Windows 10 1903 64-bit, but Windows 7 should not be any different.
What version are you using? Did you grab a binary from here or did you build your own? jdupes -v
I downloaded the binary from here.
jdupes.exe -v jdupes 1.13.1 (2019-06-10) 32-bit Compile-time extensions: windows unicode noperm nosymlink Copyright (C) 2015-2019 by Jody Bruchon
Can you download the 64-bit binary instead and see if it works?
I started with the 64 bit version at the other pc. I can try to do the test again.
I made the test with 64 bit.
Jdupes finds the duplicates, if i have the jdupestest.7z inside of the c:\jdupestest folder.
Unpacking jdupestest.7z to a fresh folder and the "error" is there. Removing jdupestest.7z from the folder c:\jdupestest does not help.
weird!
So now i have two folders which should be identical from the contents.
c:\jdupestest (jdupes does not find duplicates) c:\jdupestest.old (jdupes finds duplicates)
Now i am copying all folders inside of c:\jdupestest.old with explorer to the new folder c:\jdupestest.old2
c:\jdupestest.old2 (jdupes does not find duplicates)
OK, here's what I'd like. Get me loud debug output logs, but get them by cding into the folder first, then check for the failure in that directory (jdupes -rq .), then do jdupes -@rq . >logX.txt 2>&1 where X is a number to differentiate between the logs. What I will do is run a diff -Nau over the two log files to find where things are falling apart, and possibly WHY things are falling apart. If you could shoot a zip of the two logs (no need to 7z them) I'll look over them.
What is the difference between traverse item right and left?
traverse item right '.\100\302' traverse item left '.\100\302'
Traversal refers to the binary tree of files built during the first phase where dirs and files get counted up. The algorithm is supposed to perform a depth-first traversal with each file entry along the way as the "first" file in a match check. From there, every file AFTER the "first" one in the tree is the "second" file which is checked against that one for a match. If there is a match, the match is registered against the first file's entry in the file tree, and the search continues. It goes until every file in the tree has been checked against every other file in the tree. A "traverse left" means to follow the left node of the current entry in the tree while a "traverse right" follows the right node.
For reference, an extremely simplistic file tree looks roughly like this:
O
/ \
O O
/ \
O O
Since the scan runs top-to-bottom and then left-to-right, traversal generally goes like this: if a left node exists, follow it; else if a right node exists, follow it; else process the current node. Once all L/R nodes under a node are traversed and processed, their parent node is processed on the way back up the tree, and the parent node is an L or R node for some other node unless it's the root node at the very top, so this process continues until the root node is the only one left unprocessed.
If you are getting a right traversal instead of a left traversal and that's the difference between a match and no match, jdupes is not traversing properly. I've suspected that the core algorithm had some traversal issues for a while and this would only confirm my suspicions. (I did not write the core algorithm, it was inherited from fdupes.) I am rewriting jdupes from scratch because the core algorithm is not ideal anyway; in particular, because matching work is handled on a "match pair" basis and exclusions are done on pairs but not entire match sets, there are problems that arise where a pair is explicitly no-matched but then the no-match pair files both match a third file and get chained into a combined match set indirectly because of the third file.
Ah, okay, it is a binary tree build by the program.
Hi Jody, do you have an idea, when you have your first shot of your rewritten jdupes?
My time to work on it is increasingly limited. I don't know when I'll be able to put out a 2.0
I know this is a very old issue. If it's still an issue then let me know. I've made a ton of changes since 2019.