lf icon indicating copy to clipboard operation
lf copied to clipboard

Reflink/Copy-on-Write support

Open jantatje opened this issue 3 years ago • 4 comments

It would be nice to have an option to enable CoW for copy operations, as of version 9.0 cp from GNU coreutils does this by default. Reflinking both saves space and makes copying on the same filesystem faster. There should be an option to turn this on or off, if enabled lf should first try to reflink the file and if that fails fall back to copying normally. In my opinion it makes most sense to default this setting to true.

jantatje avatar Oct 27 '21 10:10 jantatje

I agree.

Also, I would like to add that while it is possible to implement on the user side (by redefining paste function in lfrc, as shown in man page), you'll loose the benefit of progress reporting in status bar and handling of duplicate names (.~1~ suffix).

Old buggy code My very crude implementation:
cmd paste &{{
    set -- $(cat ~/.local/share/lf/files)
    mode="$1"
    shift
    case "$mode" in
        copy)
            # FIXME this does not work for directories
            if cp -rn --reflink=always -- "$@" .; then
                lf -remote "send $id echo reflinked!"
            else
                # copy lefts empty file on reflink fail
                if [ "$PWD/$(basename "$@")" != "$@" ]; then # OCD lmao
                    rm -rf "$PWD/$(basename "$@")"
                fi
                lf -remote "send $id paste"
            fi

            ;;
        move)
            mv -n -- "$@" .

            # FIXME
            # unlike with copy, normal paste doesn't work here for some reason
            # i must be missing something
            # lf -remote "send $id paste"
            ;;
    esac
    rm ~/.local/share/lf/files
    lf -remote "send clear"
}}

Maybe someone can come up with something more robust and better, but overall - i think this should be native feature, CoW fss has been around for a long time, it would be nice to take advantage of their features in more software.

UPD Nov 12

I've made a much better workaround. @jantatje, you might find it useful.

Previous implementation above had a number of problems, but this one:

  1. Tries to use CoW on btrfs
  2. Falls back to lf's native paste if it can't
  3. Handles matching names in destination with .~1~ like lf
  4. Forwards cp errors to status line, if any
  5. No rm anywhere for a peace of mind
Old, but better code

cmd paste_try_cow &{{
    # # This was very helpful for debugging:
    # log_file="$HOME/lf-reflink-log-$(date +'%Y-%m-%d_%H-%M-%S')"
    # [ -f "$log_file" ] || touch "$log_file"
    # exec 1>> $log_file 2>&1
    # set -x

    # In theory, this may fail,
    # but I tested it on selection with 10k files - everything worked (bash)
    set -- $(cat ~/.local/share/lf/files)
    mode="$1"
    shift

    if [ $mode = 'copy' ]; then
        # Reflink if first item of selection and the destination are on the
        # same mount point and it is btrfs.
        # (to make sure reflink never fails in first place, so we don't have to
        # clean up)
        if [ "$(df "$PWD" --output=target | tail -n 1)" = \
             "$(df "$@" --output=target | tail -n 1)" ] && \
             [ "$(df --output=fstype "$PWD" | tail -n 1)" = btrfs ]; then

            echo 'selected copy and cp reflink paste'

            # # Handle same names in dst
            # # TODO make this run in parallel, idk
            # This is simple, but slow
            for i in "$@"; do
                name="${i##*/}"
                original="$name"

                count=0
                while [ -w "$PWD/$name" ]; do
                    count=$((count+1))
                    name="$original.~$count~"
                done

                set +e
                cp_out="$(cp -rn --reflink=always -- "$i" "$PWD/$name" 2>&1)"
                set -e

                if [ ! -z "$cp_out" ]; then
                    lf -remote "send $id echoerr $cp_out"
                    exit 0
                fi
            done

            # Or just skip a file when names are the same.
            # (A LOT faster if you e.g. pasting selection of 10k files)
            # cp -rn --reflink=always -- "$@" .

            lf -remote "send clear"
            lf -remote "send $id echo reflinked!"
        else
            echo 'selected copy and lf native paste'
            lf -remote "send $id paste"
        fi

    elif [ $mode = 'move' ]; then
        echo 'selected move and lf native paste'
        lf -remote "send $id paste"
    fi

    # # for debug
    # set +x

    lf -remote "send load"
}}

# name is different to avoid recursive calls
map p paste_try_cow

UPD 3 Dec

Added some cosmetic changes.

Latest version lives now in Wiki/Tips section.

MahouShoujoMivutilde avatar Nov 09 '21 22:11 MahouShoujoMivutilde

OMG, io.Copy actually seems to support CoW!

How do I know?


~/tmp ❯ time iocopy big.mkv big-iocopy-new.mkv
iocopy big.mkv big-iocopy-new.mkv  0.00s user 0.06s system 81% cpu 0.072 total

~/tmp ❯ time cp --reflink=always big.mkv big-cp-ref.mkv
cp -ir --reflink=always big.mkv big-cp-ref.mkv  0.00s user 0.02s system 96% cpu 0.024 total

~/tmp ❯ time cp --reflink=never big.mkv big-noref.mkv
cp -ir --reflink=never big.mkv big-noref.mkv  0.00s user 4.17s system 67% cpu 6.148 total

6 seconds for non-CoW cp vs less than 100ms for io.Copy and cp with CoW. WOW!

This is with go version go1.17.3 linux/amd64 on btrfs (kernel 5.15.3).

iocopy code
package main

import (
	"flag"
	"io"
	"os"
)

func main() {
	flag.Parse()
	if flag.NArg() != 2 {
		panic("usage: iocopy <src_file> <dst_file>")
	}

	src := flag.Arg(0)
	dst := flag.Arg(1)

	fin, err := os.Open(src)
	if err != nil {
		panic(err)
	}
	defer fin.Close()

	fout, err := os.Create(dst)
	if err != nil {
		panic(err)
	}
	defer fout.Close()

	_, err = io.Copy(fout, fin)

	if err != nil {
		panic(err)
	}
}

Both files are actually seem to refer to the same extents:

filefrag check

~/tmp ❯ iocopy big.mkv iocopy.mkv

~/tmp ❯ filefrag -v big.mkv > big.txt

~/tmp ❯ filefrag -v iocopy.mkv > iocopy.txt

~/tmp ❯ diff -u big.txt iocopy.txt

--- big.txt	2021-11-22 21:11:49.128978402 +0300
+++ iocopy.txt	2021-11-22 21:11:57.737916530 +0300
@@ -1,5 +1,5 @@
 Filesystem type is: 9123683e
-File size of big.mkv is 4474504665 (1092409 blocks of 4096 bytes)
+File size of iocopy.mkv is 4474504665 (1092409 blocks of 4096 bytes)
  ext:     logical_offset:        physical_offset: length:   expected: flags:
    0:        0..      31:  133900158.. 133900189:     32:             encoded,shared
    1:       32..     127:  133908099.. 133908194:     96:  133900190: shared
@@ -507,4 +507,4 @@
  503:   505622..  767731:  147864832.. 148126941: 262110:  147864714: shared
  504:   767732.. 1092389:  148126976.. 148451633: 324658:  148126942: shared
  505:  1092390.. 1092408:  119582017.. 119582035:     19:  148451634: last,encoded,shared,eof
-big.mkv: 506 extents found
+iocopy.mkv: 506 extents found

Only names are different.

For reference:


~/tmp ❯ filefrag -v big-nocow.mkv > big-nocow.txt

~/tmp ❯ wc -l big.txt iocopy.txt big-nocow.txt # quite different
  510 big.txt
  510 iocopy.txt
   16 big-nocow.txt
 1036 total


Also, this a bit different, but still related, so I'm going to comment on it here.

TIL: cifs and samba 4.1.0+ actually support server-side copy and even remote CoW (latter if share is on btrfs).

Sadly, lf's copy seems to be essentially chunk-by-chunk read from source and write to destination type of copy, which is the most universal solution if you want progress percentage, but doesn't honor those capabilities.

But guess what? io.Copy does!

This is awesome:

/mnt/remotecow ❯ cp --reflink=always big.mkv big-2.mkv # it's instant

/mnt/remotecow ❯ iocopy big.mkv big-iocopy.mkv # also instant

/mnt/remotecow ❯ cp --reflink=never big.mkv big-nocow.mkv # ... it isn't

/mnt/remotecow took 1m37s ❯


UPD

I've looked at Go's source, here is what I've found:

io.Copy actually just calls io.copyBuffer with nil buffer, which tries to use src.(WriterTo) and dst.(ReaderFrom) when possible, and falling back to reading to and writing from buffer when not.

This is probably how CoW happened to work. I haven't found where exactly it is implemented, but digging around go's source - it seems to support copy_file_range(), which in turn, from its man page:

       ...
       copy_file_range() gives filesystems an opportunity to implement "copy  accelera‐
       tion"  techniques,  such  as  the use of reflinks (i.e., two or more inodes that
       share pointers to the same copy-on-write disk blocks)  or  server-side-copy  (in
       the case of NFS).
       ...

I don't know how to do this yet, but there might be a way to pass to io.CopyBuffer some custom buffer that will count bytes copied to it and report progress after each .Write(), leading to:

  1. keeping progress counter, as it is right now
  2. using io.CopyBuffer with it's CoW capabilities (as shown above)

You don't even have to make it permanent set reflink auto, you can force io.CopyBuffer to always use a buffer like shown here.


UPD 26 Nov

Okay, found a problem with that idea. Here is my crude test implementation that failed miserably because of it.

You see, os.File always implements ReadFrom (ReaderFrom interface). It will try to use copy_file_range internally, and will fallback to normal copy if failed.

All of that without telling you, what it actually did :laughing:

Figured it out the hard way.

So you see - it is too smart for its own good.

MahouShoujoMivutilde avatar Nov 22 '21 18:11 MahouShoujoMivutilde

@jantatje @MahouShoujoMivutilde This sounds cool but it is also a little too specific. I don't think ext4 supports CoW which is still likely the most common filesystem on linux. Any PR for this should also work in other platforms without an issue.

gokcehan avatar Dec 02 '21 16:12 gokcehan