rmlint
rmlint copied to clipboard
clone hardlinks: FIDEDUPERANGE returned error: (22): Invalid argument
I ran
rmlint --types="duplicates" --config=sh:handler=clone
and then ran the rmlint.sh
, and it successfully combines a bunch of files, but also fails with this error on a bunch of others. I see no obvious difference between the file names that would throw off the command line. For example, these two files are the same:
> cmp ErFLQmkWMAMTWFu.jpeg ErFLQmkWMAMTWFu.jpg
But rmlint can't combine them:
> rmlint --dedupe ErFLQmkWMAMTWFu.jpeg ErFLQmkWMAMTWFu.jpg
ERROR: lib/session.c:331: FIDEDUPERANGE returned error: (22): Invalid argument
Happens in fish
or bash
> rmlint --version
version 2.9.0 compiled: Dec 31 2019 at [22:27:25] "Odd Olm" (rev 2)
compiled with: +mounts +nonstripped +fiemap +sha512 +bigfiles +intl +replay +xattr +btrfs-support
rmlint was written by Christopher <sahib> Pahl and Daniel <SeeSpotRun> Thomas.
The code at https://github.com/sahib/rmlint is licensed under the terms of the GPLv3.
> uname -a
Linux 5.4.0-60-generic #67-Ubuntu SMP Tue Jan 5 18:31:36 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Ohhh, you know what it probably is? I already merged a bunch of files as hardlinks first. These unreflinkable files have the same inode:
> ls -il ErFL*
8172271 -rw-rw-r-- 3 endolith endolith 756441 Jan 14 16:36 ErFLlquVEAEeXvX.jpg
8206761 -rw-rw-r-- 5 endolith endolith 644501 Jan 14 16:36 ErFLQmkWMAMTWFu.jpeg
8206761 -rw-rw-r-- 5 endolith endolith 644501 Jan 14 16:36 ErFLQmkWMAMTWFu.jpg
8172275 -rw-rw-r-- 3 endolith endolith 62767 Jan 14 16:36 ErFLVW3UcAETeps.jpg
> rmlint --dedupe ErFLQmkWMAMTWFu.jpeg ErFLQmkWMAMTWFu.jpg ERROR: lib/session.c:331: FIDEDUPERANGE returned error: (22): Invalid argument
Just checking, the files are on a reflinkable fs (btrfs or xfs)?
Ohhh, you know what it probably is? I already merged a bunch of files as hardlinks first. These unreflinkable files have the same inode:
Shouldn't matter
Yes, btrfs. I removed the hardlinks and it works fine now
The underlying problem is that hardlinks share metadata, while reflinks don't, so it's not possible to do an in-place dedupe without first creating a new inode.
Via shell it needs two commands:
$ echo data > original
$ ln original hardlink
$ cp --reflink=always original hardlink
cp: 'original' and 'hardlink' are the same file
$ cp --reflink=always original temp_clone
$ mv temp_clone hardlink
But it probably should be possible to convert hardlinks to reflinks if you want to... I can put in a fix so that this can be done with a single command:
$ rmlint --dedupe <original> <hardlink>
Unless someone can think of a reason why this might be a bad idea?
Alternatively I could change the reported error to something like Can't convert hardlinks to reflinks
instead of the unhelpful FIDEDUPERANGE returned error: (22): Invalid argument
Well mostly my report is about the unhelpful error message, but yes, being able to convert hardlinks into reflinks would be helpful: https://superuser.com/q/1618201/13889
Not as trivial as I expected but ready for testing if you or @sahib are interested: https://github.com/SeeSpotRun/rmlint/tree/clone_hardlinks
I haven't gone any further with this because it can't be done atomically so there is some risk that the hardlink gets deleted or renamed. I could do a bit more work to make it more robust but will wait to see if there is any more interest here.
All I cared about was the confusing error message. I'm fine with it skipping hardlinks. It would be better if it just detected them and didn't bother putting them in the list, though? Maybe there's an option for that that I missed.
Maybe there's an option for that that I missed.
Yes there is!
$ man rmlint | grep -A 6 "\-L"
-l --hardlinked (default) / --keep-hardlinked / -L --no-hardlinked
Hardlinked files are treated as duplicates by default (--hardlinked). If --keep-hardlinked is given, rmlint will not delete any files that are hardlinked to an original in their respective group. Such files will be
displayed like originals, i.e. for the default output with a "ls" in front. The reasoning here is to maximize the number of kept files, while maximizing the number of freed space: Removing hardlinks to originals
will not allocate any free space.
If --no-hardlinked is given, only one file (of a set of hardlinked files) is considered, all the others are ignored; this means, they are not deleted and also not even shown in the output. The "highest ranked" of the
set is the one that is considered.
Example:
$ mkdir test
$ dd if=/dev/urandom of=test/orig bs=1k count=8
$ cp test/orig test/copy
$ cp --reflink=always test/orig test/reflink
$ ln test/orig test/hardlink
$ rmlint test -o pretty -S m
ls '/home/foo/Git/rmlint/test/orig'
rm '/home/foo/Git/rmlint/test/hardlink'
rm '/home/foo/Git/rmlint/test/copy'
rm '/home/foo/Git/rmlint/test/reflink'
$ rmlint test -o pretty -S m --keep-hardlinked
ls '/home/foo/Git/rmlint/test/orig'
ls '/home/foo/Git/rmlint/test/hardlink'
rm '/home/foo/Git/rmlint/test/copy'
rm '/home/foo/Git/rmlint/test/reflink'
$ rmlint test -o pretty -S m --no-hardlinked
ls '/home/foo/Git/rmlint/test/orig'
rm '/home/foo/Git/rmlint/test/copy'
rm '/home/foo/Git/rmlint/test/reflink'
So I did manage to implement a reasonably atomic implementation to convert a hardlink to a reflink. Not elegant but it works.
Basically rmlint --dedupe original hardlink
will clone original
to a tempfile hardlink.XXXXXX
, then atomically rename hardlink.XXXXXX
to hardlink
. So worst case, a crash would lead to an extra hardlink.XXXXXX
file floating around.
Merged into https://github.com/sahib/rmlint/tree/develop and closing issue.
cp --reflink can't convert hardlinks to reflinks either. We should implement atomic un-hardlinking in rmlint.sh and make it clear in the documentation that this is something rmlint will do (users might assume vanilla cp --reflink / FICLONE behavior and want to keep their hardlinks).
$ mkdir testdir
$ echo xxx >testdir/a
$ ln testdir/a testdir/b
$ rmlint -o sh:rmlint.sh -c sh:handler=reflink testdir
$ ./rmlint.sh -dxq
Keeping: /tmp/testdir/a
Reflinking to original: /tmp/testdir/b
cp: '/tmp/testdir/a' and '/tmp/testdir/b' are the same file
Done!
I still encounter the same problem when trying to reflink a hardlink, I don't know why. I am running rmlint version 2.10.2