README Caveats regarding collapsing multiple existing hardlinks contains confusing detail
I've just starting experimenting with, trying to clearly understand the behavior of this tool (intend to integrate with dirvish). I got to the README's "Caveats / Features" section (which is pertinent to my use-case), but was confused by the example provided in README.md:
Before:
$ echo abc>a
$ ln a a1
$ ln a a2
$ cp a b
$ ln b b1
$ ln b b2
$ stat --format="name=%n inode=%i nhardlinks=%h" a* b*
name=a inode=18 nhardlinks=3
name=a1 inode=18 nhardlinks=3
name=a2 inode=18 nhardlinks=3
name=b inode=19 nhardlinks=3
name=b1 inode=19 nhardlinks=3
name=b2 inode=19 nhardlinks=3
Run rdfind:
$ rdfind -removeidentinode true -makehardlinks true ./a* ./b*
After:
$ stat --format="name=%n inode=%i nhardlinks=%h" a* b*
name=a inode=58930 nhardlinks=4
name=a1 inode=58930 nhardlinks=4
name=a2 inode=58930 nhardlinks=4
name=b inode=58930 nhardlinks=4
name=b1 inode=58931 nhardlinks=2
name=b2 inode=58931 nhardlinks=2
What I find confusing is that the inodes associated with ALL of the files have seemingly been changed by rdfind. My understanding is that rdfind would in this situation hardlink b to the inode already associated with a (and a1 and a2), i.e. without changing a (et. al.) and its associated inode. But what the example shows is that the inode associated with a (et. al.) as well as the inode associated with the files still hardlinked to the former b have both been changed by rdfind (from 18 to 58930, and from 19 to 58931 respectively).
I ran a near identical experiment (I altered the stat cmdline to cohere by inode and get nice columnar output) locally with rdfind (1.5.0), and observed the behavior I expect:
Before:
$ echo abc>a
$ ln a a1
$ ln a a2
$ cp a b
$ ln b b1
$ ln b b2
$ stat --format="inode=%i nhardlinks=%h name=%n" * | sort
inode=6423301 nhardlinks=3 name=a
inode=6423301 nhardlinks=3 name=a1
inode=6423301 nhardlinks=3 name=a2
inode=6423302 nhardlinks=3 name=b
inode=6423302 nhardlinks=3 name=b1
inode=6423302 nhardlinks=3 name=b2
Run rdfind:
$ rdfind -removeidentinode true -makehardlinks true ./a* ./b*
<elided>
After:
$ stat --format="inode=%i nhardlinks=%h name=%n" * | sort
inode=6423301 nhardlinks=4 name=a
inode=6423301 nhardlinks=4 name=a1
inode=6423301 nhardlinks=4 name=a2
inode=6423301 nhardlinks=4 name=b
inode=6423302 nhardlinks=2 name=b1
inode=6423302 nhardlinks=2 name=b2
inode=6423303 nhardlinks=1 name=results.txt
$
Note that the inode associated with a*, and that associated with b1 and b2 have not been changed by rdfind, which is the behavior which I would expect given my understanding of how rdfind operates in the presence of -makehardlinks true.
If the behavior shown by my example is intended (specifically that inodes of files which rdfind is creating new hard links to will not be changed by rdfind), I believe the README example should be corrected/updated to avoid future reader confusion.
Alternatively, if the behavior currently shown in README.md is intended, an explanation for that behavior would be helpful to my understanding of rdfind.
Sorry I couldn't see the difference in the input between the two examples, aside from a difference in your "stat" command? Why did you get two different results if you are running the same command? Why do you call them "near identical" when they look identical:
$ rdfind -removeidentinode true -makehardlinks true ./a* ./b*
$ rdfind -removeidentinode true -makehardlinks true ./a* ./b*
Or are you saying that the different "stat" command caused rdfind to behave differently? I think I'm missing something.
The "... difference in the stat command [output]" (and specifically in the inode numbers shown in the stat output, and what they imply about how rdfind behaves) is central to my question.
My ongoing assumption is that stat is not modifying any filesystem state. The stat output in the README.md example which I quoted in my initial post shows rdfind changing the hard-linking of all 6 of the files in question, and further that post-rdfind, all 6 files are hard-linked to entirely different inodes (i.e. copies of file content (which rdfind has found to be identical)) than those in play pre-rdfind: before: inodes 18, 19; after: inodes 58930, 58931; how/why did the latter pair of inodes come into the picture? Did rdfind perform file content copy operations to create them? Surely not. And why were inodes 18 & 19 not referenced post-rdfind?
What I believe should be happening, as demonstrated by the identical experiment that I ran on my own host, whose results I also supplied in my initial post, is that in the example scenario rdfind should be modifying only a single hard link (in the examples, that of file b), and that the pair of inodes in play pre-rdfind should also be the only inodes in play post-rdfind.
My best guess is that this is a simple (albeit subtle) documentation error in README.md. The inode numbers shown by the example's stat commands are not incidental to the example; rather they are key to explaining rdfind behavior in -makehardlinks true mode, which is the goal of the example.
But I am open to the possibility that something else is going on that I'm completely missing that justifies the wholesale change of inode numbers by rdfind shown in the README.md example...