rdfind icon indicating copy to clipboard operation
rdfind copied to clipboard

Consider the number of hard links for a and b when deciding how to create the hard link between a and b

Open mauromol opened this issue 3 years ago • 0 comments

This is in part related to #18, but not only.

Consider this example:

# echo abc>a
# cp a b
# ln b b1
# ln b b2
# ln b c
# stat --format="name=%n inode=%i nhardlinks=%h" a* b* c*
name=a inode=12857931 nhardlinks=1
name=b inode=12857932 nhardlinks=4
name=b1 inode=12857932 nhardlinks=4
name=b2 inode=12857932 nhardlinks=4
name=c inode=12857932 nhardlinks=4

We start with a file set where 4 files use the same inode (b, b1, b2, c). Then run

# rdfind -removeidentinode false -makehardlinks true ./a* ./b*
# stat --format="name=%n inode=%i nhardlinks=%h" a* b* c*
name=a inode=12857931 nhardlinks=4
name=b inode=12857931 nhardlinks=4
name=b1 inode=12857931 nhardlinks=4
name=b2 inode=12857931 nhardlinks=4
name=c inode=12857932 nhardlinks=1

Please note that:

  • c is not in the rdfind input!
  • you can remove -removeidentinode false to get the known "caveat" problem, but this is not the point

The result is that we've broken the link between b* and c and we've not gained any space. This can be a problem when you have a set of "snapshots" created with rsync, linked together with --link-dest and you run rdfind on just some of these snapshots.

rdfind seems to take the first encountered file as the target for hard link creation. But, if it had taken one of the files with the highest number of hard links (b, b1 or b2), the result could have been:

name=a inode=12857932 nhardlinks=5
name=b inode=12857932 nhardlinks=5
name=b1 inode=12857932 nhardlinks=5
name=b2 inode=12857932 nhardlinks=5
name=c inode=12857932 nhardlinks=5

No links broken and space reclaimed!

mauromol avatar Feb 03 '22 10:02 mauromol