rmlint icon indicating copy to clipboard operation
rmlint copied to clipboard

clone hardlinks: FIDEDUPERANGE returned error: (22): Invalid argument

Open endolith opened this issue 3 years ago • 11 comments

I ran

rmlint --types="duplicates" --config=sh:handler=clone

and then ran the rmlint.sh, and it successfully combines a bunch of files, but also fails with this error on a bunch of others. I see no obvious difference between the file names that would throw off the command line. For example, these two files are the same:

> cmp ErFLQmkWMAMTWFu.jpeg ErFLQmkWMAMTWFu.jpg

But rmlint can't combine them:

> rmlint --dedupe ErFLQmkWMAMTWFu.jpeg ErFLQmkWMAMTWFu.jpg
ERROR: lib/session.c:331: FIDEDUPERANGE returned error: (22): Invalid argument

Happens in fish or bash

> rmlint --version
version 2.9.0 compiled: Dec 31 2019 at [22:27:25] "Odd Olm" (rev 2)
compiled with: +mounts +nonstripped +fiemap +sha512 +bigfiles +intl +replay +xattr +btrfs-support

rmlint was written by Christopher <sahib> Pahl and Daniel <SeeSpotRun> Thomas.
The code at https://github.com/sahib/rmlint is licensed under the terms of the GPLv3.

> uname -a
Linux 5.4.0-60-generic #67-Ubuntu SMP Tue Jan 5 18:31:36 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Ohhh, you know what it probably is? I already merged a bunch of files as hardlinks first. These unreflinkable files have the same inode:

> ls -il ErFL*
8172271 -rw-rw-r-- 3 endolith endolith 756441 Jan 14 16:36 ErFLlquVEAEeXvX.jpg
8206761 -rw-rw-r-- 5 endolith endolith 644501 Jan 14 16:36 ErFLQmkWMAMTWFu.jpeg
8206761 -rw-rw-r-- 5 endolith endolith 644501 Jan 14 16:36 ErFLQmkWMAMTWFu.jpg
8172275 -rw-rw-r-- 3 endolith endolith  62767 Jan 14 16:36 ErFLVW3UcAETeps.jpg

endolith avatar Jan 16 '21 18:01 endolith

> rmlint --dedupe ErFLQmkWMAMTWFu.jpeg ErFLQmkWMAMTWFu.jpg
ERROR: lib/session.c:331: FIDEDUPERANGE returned error: (22): Invalid argument

Just checking, the files are on a reflinkable fs (btrfs or xfs)?

Ohhh, you know what it probably is? I already merged a bunch of files as hardlinks first. These unreflinkable files have the same inode:

Shouldn't matter

SeeSpotRun avatar Jan 17 '21 07:01 SeeSpotRun

Yes, btrfs. I removed the hardlinks and it works fine now

endolith avatar Jan 17 '21 13:01 endolith

The underlying problem is that hardlinks share metadata, while reflinks don't, so it's not possible to do an in-place dedupe without first creating a new inode.

Via shell it needs two commands:

$ echo data > original
$ ln original hardlink
$ cp --reflink=always original hardlink 
cp: 'original' and 'hardlink' are the same file
$ cp --reflink=always original temp_clone
$ mv temp_clone hardlink

But it probably should be possible to convert hardlinks to reflinks if you want to... I can put in a fix so that this can be done with a single command:

$ rmlint --dedupe <original> <hardlink>

Unless someone can think of a reason why this might be a bad idea?

Alternatively I could change the reported error to something like Can't convert hardlinks to reflinks instead of the unhelpful FIDEDUPERANGE returned error: (22): Invalid argument

SeeSpotRun avatar Jan 17 '21 22:01 SeeSpotRun

Well mostly my report is about the unhelpful error message, but yes, being able to convert hardlinks into reflinks would be helpful: https://superuser.com/q/1618201/13889

endolith avatar Jan 17 '21 23:01 endolith

Not as trivial as I expected but ready for testing if you or @sahib are interested: https://github.com/SeeSpotRun/rmlint/tree/clone_hardlinks

SeeSpotRun avatar Jan 18 '21 03:01 SeeSpotRun

I haven't gone any further with this because it can't be done atomically so there is some risk that the hardlink gets deleted or renamed. I could do a bit more work to make it more robust but will wait to see if there is any more interest here.

SeeSpotRun avatar Mar 14 '21 02:03 SeeSpotRun

All I cared about was the confusing error message. I'm fine with it skipping hardlinks. It would be better if it just detected them and didn't bother putting them in the list, though? Maybe there's an option for that that I missed.

endolith avatar Mar 14 '21 03:03 endolith

Maybe there's an option for that that I missed.

Yes there is!

$ man rmlint | grep -A 6 "\-L"
       -l --hardlinked (default) / --keep-hardlinked / -L --no-hardlinked
              Hardlinked files are treated as duplicates by default (--hardlinked). If --keep-hardlinked is given, rmlint will not delete any files that are hardlinked to an original in their respective group. Such files  will  be
              displayed  like  originals,  i.e.  for the default output with a "ls" in front.  The reasoning here is to maximize the number of kept files, while maximizing the number of freed space: Removing hardlinks to originals
              will not allocate any free space.

              If --no-hardlinked is given, only one file (of a set of hardlinked files) is considered, all the others are ignored; this means, they are not deleted and also not even shown in the output. The "highest ranked" of the
              set is the one that is considered.

Example:

$ mkdir test
$ dd if=/dev/urandom of=test/orig bs=1k count=8
$ cp test/orig test/copy
$ cp --reflink=always test/orig test/reflink
$ ln test/orig test/hardlink

$ rmlint test -o pretty -S m
    ls '/home/foo/Git/rmlint/test/orig'
    rm '/home/foo/Git/rmlint/test/hardlink'
    rm '/home/foo/Git/rmlint/test/copy'
    rm '/home/foo/Git/rmlint/test/reflink'
$ rmlint test -o pretty -S m --keep-hardlinked
    ls '/home/foo/Git/rmlint/test/orig'
    ls '/home/foo/Git/rmlint/test/hardlink'
    rm '/home/foo/Git/rmlint/test/copy'
    rm '/home/foo/Git/rmlint/test/reflink'
$ rmlint test -o pretty -S m --no-hardlinked
    ls '/home/foo/Git/rmlint/test/orig'
    rm '/home/foo/Git/rmlint/test/copy'
    rm '/home/foo/Git/rmlint/test/reflink'

SeeSpotRun avatar Mar 22 '21 06:03 SeeSpotRun

So I did manage to implement a reasonably atomic implementation to convert a hardlink to a reflink. Not elegant but it works.

Basically rmlint --dedupe original hardlink will clone original to a tempfile hardlink.XXXXXX, then atomically rename hardlink.XXXXXX to hardlink. So worst case, a crash would lead to an extra hardlink.XXXXXX file floating around.

Merged into https://github.com/sahib/rmlint/tree/develop and closing issue.

SeeSpotRun avatar Mar 27 '21 01:03 SeeSpotRun

cp --reflink can't convert hardlinks to reflinks either. We should implement atomic un-hardlinking in rmlint.sh and make it clear in the documentation that this is something rmlint will do (users might assume vanilla cp --reflink / FICLONE behavior and want to keep their hardlinks).

$ mkdir testdir
$ echo xxx >testdir/a
$ ln testdir/a testdir/b
$ rmlint -o sh:rmlint.sh -c sh:handler=reflink testdir
$ ./rmlint.sh -dxq
Keeping:  /tmp/testdir/a
Reflinking to original: /tmp/testdir/b
cp: '/tmp/testdir/a' and '/tmp/testdir/b' are the same file
Done!

cebtenzzre avatar Aug 19 '22 02:08 cebtenzzre

I still encounter the same problem when trying to reflink a hardlink, I don't know why. I am running rmlint version 2.10.2

amalgame21 avatar Oct 11 '23 22:10 amalgame21