desync icon indicating copy to clipboard operation
desync copied to clipboard

Hardlink support

Open kmeaw opened this issue 6 years ago • 3 comments

desync does not support hardlinks:

kmeaw@kmeaw-pc /tmp $ mkdir source dest
kmeaw@kmeaw-pc /tmp $ cp /bin/busybox source
kmeaw@kmeaw-pc /tmp $ ln source/busybox source/gzip
kmeaw@kmeaw-pc /tmp $ ln source/busybox source/tar
kmeaw@kmeaw-pc /tmp $ desync tar desync.catar source/
kmeaw@kmeaw-pc /tmp $ desync untar --no-same-owner desync.catar dest/
Unpacking [=====================================================================
kmeaw@kmeaw-pc /tmp $ ls -li source
total 7944
11330916 -rwxr-xr-x 3 kmeaw kmeaw 2708080 Oct 23 19:09 busybox
11330916 -rwxr-xr-x 3 kmeaw kmeaw 2708080 Oct 23 19:09 gzip
11330916 -rwxr-xr-x 3 kmeaw kmeaw 2708080 Oct 23 19:09 tar
kmeaw@kmeaw-pc /tmp $ ls -li dest
total 7944
11331283 -rwxr-xr-x 1 kmeaw kmeaw 2708080 Oct 23 19:09 busybox
11331284 -rwxr-xr-x 1 kmeaw kmeaw 2708080 Oct 23 19:09 gzip
11331285 -rwxr-xr-x 1 kmeaw kmeaw 2708080 Oct 23 19:09 tar

Source inode numbers are the same (hardlinked), destination inode numbers are unique.

But with tarfs things get even worse:

kmeaw@kmeaw-pc /tmp $ tar -C source -c . | desync tar --input-format tar desync.catar -
kmeaw@kmeaw-pc /tmp $ desync untar --no-same-owner desync.catar dest/
Unpacking [=====================================================================================
kmeaw@kmeaw-pc /tmp $ ls -li dest/
total 2648
11336391 -rwxr-xr-x 1 kmeaw kmeaw       0 Oct 23 19:09 busybox
11336390 -rwxr-xr-x 1 kmeaw kmeaw       0 Oct 23 19:09 gzip
11336389 -rwxr-xr-x 1 kmeaw kmeaw 2708080 Oct 23 19:09 tar

Destination gets zero-length files instead of copies.

kmeaw avatar Oct 23 '19 16:10 kmeaw

As casync also does not support storing source filesystem hardlinks information into catar, I suggest treating tarfs hardlinks as duplicate files.

kmeaw avatar Oct 23 '19 16:10 kmeaw

https://github.com/systemd/casync/issues/183

kmeaw avatar Oct 24 '19 12:10 kmeaw

Since catar has no way to encode hardlinks (yet), this isn't something desync can do at this point since it would require rewinding the stream to duplicate the file. One workaround would be to add --hard-dereference to the tar command to ensure the content is duplicated in the input stream.

tar --hard-dereference -C /path/to/dir -c . | desync tar --input-format tar archive.catar -

I updated the documentation to mention this limitation. Wondering if I should make desync fail/or warn when it sees a hardlink in a tar stream. Probably better than ending up with empty files.

folbricht avatar Oct 26 '19 16:10 folbricht