sshfs
sshfs copied to clipboard
Read error on remote is reported as EOF
When trying to sequentially read a file (i.e. copying it) and the remote host runs into a read error, sshfs / fuse instead reports end-of-file. The local side has no way of knowing that an error occured and will assume EOF was actually reached. This results in partial copies without any error message whatsoever. This is especially bad in conjunction with mv
(across filesystem boundaries), as that will unlink the source file after apparently successfully copying it.
scp
properly reports the I/O error. Both local and remote host are running OpenSSH 8.3p1 on Arch, local has sshfs 3.7.0.
Thanks for the report! Do you know a way to reliably reproduce this problem (i.e., how to trigger a read error on the remote side)?
I suspect that the problem is not with SSHFS, but with the SFTP protocol (which is not used by scp). In that cases there is unfortunately nothing that we can do on the SSHFS side to change this...
It should be fairly easy if you can create a throwaway checksummed filesystem. Just create a small image file, format it with e.g. BTRFS, put some files onto it and write some junk onto the backing loop device. As long as you can still mount it (or simply don't unmount it in the first place), you should now get read errors when trying to read those files.
Here's what I did to test this more thoroughly.
On Remote:
(Note that you cannot use /dev/zero
instead of /dev/urandom
in the dd
call, because all-zero blocks are actually valid.)
% truncate -s2G temp.img
% mkfs.btrfs temp.img
btrfs-progs v5.9
See http://btrfs.wiki.kernel.org for more information.
Label: (null)
UUID: 0d3f5a61-5881-44cf-a901-85c003dfcca3
Node size: 16384
Sector size: 4096
Filesystem size: 2.00GiB
Block group profiles:
Data: single 8.00MiB
Metadata: DUP 102.38MiB
System: DUP 8.00MiB
SSD detected: no
Incompat features: extref, skinny-metadata
Runtime features:
Checksum: crc32c
Number of devices: 1
Devices:
ID SIZE PATH
1 2.00GiB temp.img
% sudo losetup -f --show temp.img
/dev/loop0
% sudo mount /dev/loop0 /mnt
% sudo chown $USER:$USER /mnt
% cat < /dev/zero > /mnt/testfile
cat: write error: No space left on device
% sudo dd if=/dev/urandom of=/dev/loop0 seek=64 bs=2M
dd: error writing '/dev/loop0': No space left on device
961+0 records in
960+0 records out
2013265920 bytes (2.0 GB, 1.9 GiB) copied, 9.36271 s, 215 MB/s
% cat < /mnt/testfile > /dev/null
cat: -: Input/output error
On Local:
% scp $remote:/mnt/testfile .
testfile 96% 1755MB 108.8MB/s 00:00 ETAscp: /mnt/testfile: Input/output error
testfile 100% 1826MB 109.7MB/s 00:16
% sftp $remote
Connected to $remote.
sftp> get /mnt/testfile
Fetching /mnt/testfile to testfile
/mnt/testfile 11% 219MB 104.0MB/s 00:15 ETA
Couldn't read from remote file "/mnt/testfile" : Failure
sftp> exit
% mkdir temp
% sshfs $remote:/mnt temp
% dd if=temp/testfile of=/dev/null
441320+0 records in
441320+0 records out
225955840 bytes (226 MB, 215 MiB) copied, 2.05436 s, 110 MB/s
After connecting with sftp
, I made sure that the remote side actually had a new sftp-server
process running. So, the issue does not seem to be with the protocol.
Thanks for investigating! The next step would probably be to enable sshfs_debug
to get a clearer picture of what the remote host sends and what SSHFS is doing with it. It's probably easiest while enabling synchronous operation, the releveant code path is sshfs_read() -> sshfs_sync_read() -> wait_chunk()
. Glancing over this, I don't see any code that would deliberately translate errors to EOF, but it may happen by accident..
I can reproduce this. I attached a corrupted file system image that can be used for testing. You can mount it like so:
gunzip errimg.gz
mkdir mnt
sudo mount errimg mnt
It contains a single file file
that is exactly 16 MiB + 12 bytes in size. The 12 additional bytes are the corrupted part. If you read it directly, you’ll get an I/O error. If you copy it over sshfs, no error is shown. The corrupted part gets silently dropped and the copy is exactly 16 MiB in size.
This is what sshfs shows with debug enabled:
[00522] READ
[00523] READ
[00524] READ
[00525] READ
[00526] READ
[00527] READ
[00528] READ
[00529] READ
[00522] DATA 32781bytes (1ms)
[00523] DATA 32781bytes (2ms)
[00524] DATA 32781bytes (2ms)
[00525] DATA 32781bytes (3ms)
[00526] DATA 32781bytes (3ms)
[00527] DATA 32781bytes (4ms)
[00528] DATA 32781bytes (4ms)
[00529] DATA 32781bytes (5ms)
[00530] READ
[00530] STATUS 28bytes (1ms)
[00531] CLOSE
[00531] STATUS 28bytes (0ms)
Below is the script that I used to create the attached file. It needs the program bbe
installed, a sed-like editor for binary files. It must be run as root.
#!/usr/bin/bash
dd if=/dev/zero of=img bs=1M count=45
DEV=$(losetup --show --find img)
mkfs.btrfs $DEV
mkdir mnt
mount $DEV mnt
dd if=/dev/zero bs=1M count=16 > mnt/file
echo 'MAGICSTRING' >> mnt/file
umount mnt
losetup --detach $DEV
bbe --expression='s/MAGICSTRING/ERRORSTRING/' --output=errimg img
gzip --keep errimg
This is what the transfer looks like if the file is not corrupted:
[00526] READ
[00527] READ
[00528] READ
[00529] READ
[00530] READ
[00531] READ
[00532] READ
[00533] READ
[00526] DATA 32781bytes (2ms)
[00527] DATA 32781bytes (2ms)
[00528] DATA 32781bytes (2ms)
[00529] DATA 32781bytes (3ms)
[00530] DATA 32781bytes (3ms)
[00531] DATA 32781bytes (4ms)
[00532] DATA 32781bytes (4ms)
[00533] DATA 32781bytes (5ms)
[00534] READ
[00534] DATA 25bytes (0ms)
[00535] CLOSE
[00535] STATUS 28bytes (0ms)
[00536] LSTAT
[00536] STATUS 33bytes (0ms)
[00537] LSTAT
[00537] STATUS 33bytes (0ms)
[00538] LSTAT
[00538] STATUS 33bytes (0ms)
So apparently the problem is that if a READ gets a STATUS response, that STATUS is ignored.
Thanks for digging into this! Would you be able to take a look at the code and maybe fix the issue? It seems you have narrowed down what's going wrong.
(SSHFS is fully volunteer driven. This means new features and bugfixes are implemented only when someone has a personal interest in them and therefore also does the necessary work. )