sshfs-win icon indicating copy to clipboard operation
sshfs-win copied to clipboard

Intensive I/O causes resource unavailable

Open er1z opened this issue 5 years ago • 37 comments

My setup:

  • Ubuntu 18.04.1 LTS in Hyper-V,
  • Windows 10 1803 17134.1,
  • SSHFS: sshfs-win-2.7.17334-x64.msi,
  • WinFSP: winfsp-1.3.18160.msi.

I establish a connection to the resource using the official way via „map network drive”. Connection is done via host-only interface (not the NAT one). On the Ubuntu I've created some Symfony projects and want to use this on host's PHPStorm instance. Starting and indexing works perfect until — via some reason — the mapped drive's contents completely disappears with a message „access denied” under Explorer.

I've tried to investigate this issue a bit and found out that tweaking sshd params has no affect on this. In Windows events log for WinFSP there's many entries like:

sshfs: The service sshfs has been stopped.
launcher-x64: terminated sshfs\[email protected]\..\..\workspace
launcher-x64: create sshfs\[email protected]\..\..\workspace
sshfs: The service sshfs has been started.

and these appear like within the loop.

I'm pretty fresh to WinFSP (and amazed by iops it does) so if is there any additional debug information needed, point what to do, please.

er1z avatar Sep 11 '18 21:09 er1z

Please check whether you have any user mode crash dumps (usually found in C:\Users\USER\AppData\Local) or any Cygwin stack dumps (likely found in \Program Files\SSHFS-Win).

Also try the latest SSHFS-Win beta, which is based on latest version of SSHFS with lots of improvements.

billziss-gh avatar Sep 11 '18 21:09 billziss-gh

No crashdumps, no difference on latest betas. Sometimes Windows just shows the mapped drive with red cross and it comes back to the green after a few seconds.

It's very disturbing because if directory is unavailable, PHPStorm „closes” the project and needs to be re-indexed from scratch. :/

Right now I've noticed an error: „An error occurred while reconnecting Y: to (...) Windofs FileSystem Proxy: The network is not present or not started. This connection has not been restored.”

To clarify, network connection is stable because SMB share is working ok to the same host.

er1z avatar Sep 11 '18 21:09 er1z

Curious that you do not have any dumps if the app crashes.

See here for collecting User-Mode Dumps on Windows.

Cygwin also creates a "stackdump" file when a Cygwin app crashes. I am not sure why this is missing.

billziss-gh avatar Sep 11 '18 22:09 billziss-gh

Dumps for other apps were present.

May it's not a crash? I've searched against this phrase within user profile and program files folders + temp and found nothing. :(

May be it is a lag for re-spawning a process? Or should the one be kept in memory as long as possible?

er1z avatar Sep 11 '18 22:09 er1z

Now I got this:

image

er1z avatar Sep 11 '18 22:09 er1z

Windows and SSHFS-Win will not unmount without explicit user interaction (i.e. disconnect network drive) (although I have not tested against Win 10 1803).

I am wondering if the SSHFS server could cause these problems somehow.

Are you able to launch sshfs.exe from the command line and observe the behavior there?

billziss-gh avatar Sep 11 '18 22:09 billziss-gh

But it seems it's still mounted, just completely unavailable.

I tried mounting via SiriKali and result is pretty the same.

Using command line happens something weird:

"c:\Program Files\SSHFS-Win\bin\sshfs.exe" -o create_umask=007,uid=-1,gid=-1,cache=yes,cache_max_size=2000000,cache_timeout=120,follow_symlinks,compression=no u@host:/dir Y: u@host's password: read: Connection reset by peer

But server seems to work ok because I have opened a regular SSH session to this host and works ok. I think the error is produced by returning the sftp back-end to the stdin where nothing happens.

If I use a ssh.exe from SSHFS-Win and sign-in session is opened correctly.

er1z avatar Sep 12 '18 07:09 er1z

Aha. Do you have more than one SSH program installed? Others have reported similar problems where SSHFS-Win picks up the wrong SSH for some reason (although it should not when you use “map network drive” from explorer).

billziss-gh avatar Sep 12 '18 16:09 billziss-gh

I've removed W10's OpenSSH client, then renamed all found instances of ssh.exe within PATH and mounted the volume. This postponed the failure some time but it doesn't helped.

When I try to make the same from console:

eRIZ@host C:\Program Files\SSHFS-Win\bin
$ "c:\Program Files\SSHFS-Win\bin\sshfs.exe" -d  [email protected]:/workspace W:
SSHFS version 3.4.0
executing <ssh> <-x> <-a> <-oClearAllForwardings=yes> <-2> <[email protected]> <-s> <sftp>
failed to execute 'ssh': No such file or directory
read: Connection reset by peer

eRIZ@host C:\Program Files\SSHFS-Win\bin
$ ls
cygcom_err-2.dll     cyggssapi_krb5-2.dll  cygkrb5support-0.dll  sshfs-win.exe
cygcrypto-1.0.0.dll  cygiconv-2.dll        cygpcre-1.dll         sshfs.exe
cygfuse-3.2.dll      cygintl-8.dll         cygwin1.dll
cyggcc_s-seh-1.dll   cygk5crypto-3.dll     cygz.dll
cygglib-2.0-0.dll    cygkrb5-3.dll         ssh.exe

Then tried to start ssh.exe:

$ ssh -vvvv
{PID:2520} failed to run shell (2): No such file or directory
{PID:2520} shell: `ssh`
{PID:2520}   dir: `/cygdrive/c/Program Files/SSHFS-Win/bin`

Any ideas?

er1z avatar Sep 12 '18 22:09 er1z

From the command line open a cmd.exe prompt (not Cygwin) and try:

>cd \Program Files\SSHFS-Win
>PATH=C:\Program Files\SSHFS-Win\bin
>sshfs.exe ...

This should find the correct ssh.exe for you.

billziss-gh avatar Sep 13 '18 00:09 billziss-gh

Now I have correctly mounted a resource using CLI but issue still occurs. :(

Trying with -d…

er1z avatar Sep 13 '18 21:09 er1z

This is a log content just before failure:

  [476598]          ATTRS       41bytes (2ms)
[476599] LSTAT
  [476599]         STATUS       33bytes (42ms)
[476600] OPENDIR
  [476600]         STATUS       28bytes (7ms)
[476601] LSTAT
  [476601]          ATTRS       41bytes (13ms)
[476602] OPENDIR
  [476602]         STATUS       28bytes (1ms)
[476603] LSTAT
  [476603]          ATTRS       41bytes (1ms)
[476604] LSTAT
  [476604]          ATTRS       41bytes (1ms)
[476605] LSTAT
  [476605]          ATTRS       41bytes (1ms)
[476606] LSTAT
  [476606]          ATTRS       41bytes (1ms)
[476607] LSTAT
[476608] LSTAT
  [476607]          ATTRS       41bytes (2699ms)
  [476608]          ATTRS       41bytes (1766ms)
[476609] EXTENDED
[476610] EXTENDED
  [476609] EXTENDED_REPLY       97bytes (25ms)
  [476610] EXTENDED_REPLY       97bytes (10ms)
[476611] LSTAT
  [476611]          ATTRS       41bytes (2ms)
[476612] LSTAT
  [476612]          ATTRS       41bytes (1ms)
[476613] EXTENDED
  [476613] EXTENDED_REPLY       97bytes (1ms)
[476614] LSTAT
  [476614]         STATUS       33bytes (2ms)
[476615] OPENDIR
  [476615]         STATUS       28bytes (1ms)
[476616] EXTENDED
  [476616] EXTENDED_REPLY       97bytes (1ms)
[476617] LSTAT
  [476617]          ATTRS       41bytes (1ms)
[476618] EXTENDED
  [476618] EXTENDED_REPLY       97bytes (1ms)
[476619] EXTENDED
  [476619] EXTENDED_REPLY       97bytes (13ms)

er1z avatar Sep 14 '18 06:09 er1z

@er1z I am not sure why the failures persist.

As a last resort consider getting the latest SSHFS from libfuse/sshfs and compiling it yourself in your Cygwin environment. You will need the latest beta of WinFsp and to also install the included "FUSE for Cygwin".

billziss-gh avatar Sep 14 '18 19:09 billziss-gh

Tried to mix up with a different instances of Windows and — for some weird reason — Windows 8 within VM seems to be working ok. But how to debug this?

er1z avatar Sep 20 '18 18:09 er1z

@er1z I strongly suspect that this is a system specific issue. Perhaps some kind of conflict between your existing Cygwin system and SSHFS-Win. Unfortunately Windows is a very complex system and it is not always straightforward to debug such problems.

billziss-gh avatar Sep 21 '18 02:09 billziss-gh

On two of my computers something weird occurs; is it really has to be so instance specific to make this working correctly? Dokan(y) hadn't had so many issues. :/

er1z avatar Sep 25 '18 17:09 er1z

I've recreated the issue on the fourth computer with clean OS. Steps:

  1. Create a directory on Linux VM with many small files. For example, a new project with Symfony standard edition with a few packages,
  2. Mount the directory using SSHFS-WIN on Windows-host,
  3. Run PHPStorm and let it index everything within directory,
  4. File->Invalidate caches and restart -> Invalidate and restart,
  5. Repeat this several times,
  6. Observe that drive is neither accessible nor readable.

A workaround is to kill sshfs-win.exe, sshfs.exe and re-establish connection via net use X: \\sshfs…. Please, look at this because you have much greater knowledge on Windows/FS internals than me. It's definitely not system specific issue if many small files are being processed on several computers.

er1z avatar Oct 16 '18 12:10 er1z

The real problem is that when any error happens on the ssh level of connection, it is not detected and it is not reported back by sshfs or any higher wrapper like winfsp. All this is "lost in translation" and sshfs just hangs. SSHFS with default options doesn't even handle general server timeouts like common ssh clients, it only detects closed connections. The only way user can solve this is to get meaningful info from debug log from the ssh level and fix the ssh problem.

There are also other things, that may interfere like Windows auto-disconnect mechanism for network drives.

pavelxkrejci avatar Nov 16 '18 09:11 pavelxkrejci

Within one of the systems I've managed to connect via sshfs.exe, i'll post update upon establishing connection on the other my systems.

Now I have to test how it behaves with many small files.

edit: IO hang still occurs.

er1z avatar Nov 16 '18 11:11 er1z

I'm having this exact same issue as well - any updates on it? Some sort of auto-reconnect policy perhaps?

baxxos avatar Feb 14 '19 08:02 baxxos

I've talked to a teammate who also uses WinFSP + SSHFS and he also encountered this problem.

er1z avatar Feb 26 '19 09:02 er1z

I also seem to be having this issue. The mount is from a Windows 7 machine to Devuan Linux box. The share was mounted with net use z: \\sshfs.r\USER@IP\media\share /persistence:on. The share also seems to drop and become unavailable randomly after large amounts of IO. Sometimes I can double-click on the drive and it will work again, and sometimes I need to reboot the Windows box to get it to work again. I'm using the 3.5 BETA of sshfs-win and 2019.2 of WinFsp.

bcHelix avatar May 27 '19 03:05 bcHelix

@bcHelix — it's not necessary to reboot but you have to kill any processes under sshfs (especially ssh). @billziss-gh — any chances to get through this?

er1z avatar May 27 '19 12:05 er1z

Exact same issue here, vscode's language server yells at me because it fails to enumerate the files in .git and then I get the permissions denied.

FrobtheBuilder avatar Jun 15 '19 10:06 FrobtheBuilder

I believe I am also experiencing this issue. I like to stream music from my home server but the main server is unavailable which normally runs MPD—instead I am using SSHFS-Win as a temporary workaround. When there is no traffic on the line everything is fine, but when everyone comes in and my connection slows a little, the process will kill itself and the drive will become empty. Due to using private keys with passwords the easiest way to get things to work was to invoke SSHFS like this:

@ECHO OFF
TITLE X:
REM you may have to do this if your ssh.exe is getting clobbered by system ssh.exe, or if you do not have SSHFS-Win on your %PATH%
CD "C:\Program Files\SSHFS-Win\bin"
:loop
REM uncomment if you only mount one drive at a time to clean up old processes after a crash
REM taskkill /F /IM sshfs.exe
REM taskkill /F /IM sshfs-win.exe
sshfs-win cmd [email protected]: X: -f -o uid=-1,gid=-1,umask=027,create_umask=027
GOTO loop

And whenever the music stops after say a minute or two, I go back to the prompt, re-enter my password, and a new process is spawned. Whatever program I am accessing the files with gets error 995, before all the files cease to exist.

The I/O operation has been aborted because of either a thread exit or an application request.

I may be using things incorrectly but I haven't been able to get SSHFS-Win to work any other way with password-protected private keys.

I'm not used to Windows file redirection so I can't get any logs from -d -o LogLevel=DEBUG3...

I am using 3.5.19106.

sevmonster avatar Aug 23 '19 18:08 sevmonster

Thanks @ds6 for that script! I have a python script that monitors the directory and remounts when needed. That error 995 was driving me crazy, but after seeing your script I realized I can just kill the process. Thank you!

JeremyKennedy avatar Sep 10 '19 20:09 JeremyKennedy

@JeremyKennedy No problem. I've updated it to respawn more reliably. It took me a moment to realize why the drive would still be there even after killing the sshfs-win process, because it's actually the sshfs process that creates the drive :)

sevmonster avatar Oct 14 '19 13:10 sevmonster

I got SSHFS to crash. I don't know if this is related to encoding or high I/O. I will look into it later.

SSHFS version 3.5.2 FUSE library version 3.2

The service sshfs has been stopped.
Exception: STATUS_ACCESS_VIOLATION at rip=001801A412E
rax=0000000000000007 rbx=0000000000000000 rcx=0000000000000000
rdx=000000000000003A rsi=0000000000000003 rdi=000000000429C670
r8 =000000000000025F r9 =0000000180208720 r10=0000000000000000
r11=0000000100401788 r12=000000000429C681 r13=3A64726F77737361
r14=0000000000000000 r15=0000000000000004
rbp=000000000429C680 rsp=000000000429C638
program=C:\Program Files\SSHFS-Win\bin\sshfs.exe, pid 1733, thread
cs=0033 ds=002B es=002B fs=0053 gs=002B ss=002B
Stack trace:
Frame        Function    Args
0000429C680  001801A412E (0000429C66F, 00000000001, 00000000009, 00000000004)
0000429C680  001004017EF (00000000000, 0017C06D460, 0000429C54F, 00000000000)
00000000000  00100402E25 (00000000048, 00000000000, 00000000080, 0000429CAC0)
00000000000  00100404415 (001800F6CB2, 0000429CAE0, 006001913B0, 000FFFFFFFF)
00000000000  0010040473D (00600000000, 70006E00690077, 00000640069, 00000000000)
00000000000  00100405375 (001800FA30C, 0000000000B, 19DB1DED53E8000, 00600136810)
000007F99B0  00100409190 (7FFB893B71D6, 00000000000, 00000000000, 00000000000)
000007F99B0  00100409685 (00000000000, 0000087C080, 0060007FB60, 00000000000)
000007F99B0  7FFB75D828E0 (00000000000, 000FFFFFFFF, 00000000058, 00000000000)
000007F99B0  7FFB75D87214 (0000087C080, 00000868030, 00000000000, 000007F99B0)
000007F99B0  7FFB75D93C5A (000FFFFFFFF, 0000087C080, 000FFFFFFFF, 00000868030)
000007F99B0  7FFB75D962D1 (00000000000, 00000000058, 7FFB75D960D0, 00000000000)
000007F99B0  00180048353 (00000000000, 00000000000, 00000000000, 00000000000)
00000000000  00180048404 (00000000000, 00000000000, 00000000000, 00000000000)
00000000000  7FFB89397974 (00000000000, 00000000000, 00000000000, 00000000000)
00000000000  7FFB8A28A271 (00000000000, 00000000000, 00000000000, 00000000000)
End of stack trace (more stack frames may be present)
The service sshfs has been started.
The service sshfs has been stopped.
Exception: STATUS_ACCESS_VIOLATION at rip=001801A412E
rax=0000000000000007 rbx=0000000000000000 rcx=0000000000000000
rdx=000000000000003A rsi=0000000000000003 rdi=00000000039EC560
r8 =000000000000025F r9 =0000000180208720 r10=0000000000000000
r11=0000000100401788 r12=00000000039EC571 r13=3A64726F77737361
r14=0000000000000000 r15=000000000000000B
rbp=00000000039EC570 rsp=00000000039EC528
program=C:\Program Files\SSHFS-Win\bin\sshfs.exe, pid 1738, thread
cs=0033 ds=002B es=002B fs=0053 gs=002B ss=002B
Stack trace:
Frame        Function    Args
000039EC570  001801A412E (000039EC55F, 00000000001, 00000000009, 00000000004)
000039EC570  001004017EF (00000000000, 0017C91D570, 000039EC43F, 000039EC791)
00600276E70  00100402E25 (00000000048, 00600276E70, 000039ECAE0, 000039EC9B0)
00600276E70  00100404415 (006001804E0, 000006AC730, 001800C880C, 00000000000)
00600276E70  0010040473D (006001804E0, 000006AC730, 0060017C110, 00000000000)
00600276E70  00100405375 (00028000000, 00000000000, 00000000000, 0060014FD40)
0060017C110  00100408D1B (7FFB00000000, 00000000020, 00600000000, 70006E00690077)
000039ECB60  0010040993C (19DB1DED53E8000, 00000000000, 7FFB893B71D6, 000039ECAE0)
00000000000  7FFB75D827D7 (000006DD4A0, 000006E0C50, 00000730200, 000039ECB80)
000006DD4A0  7FFB75D88601 (000007341D0, 000006DD4A0, 00000000000, 00000610000)
000006DD4A0  7FFB75D94730 (000FFFFFFFF, 000007341D0, 000FFFFFFFF, 00000730200)
000006DD4A0  7FFB75D962D1 (00000000000, 00000000258, 7FFB75D960D0, 00000000000)
000006DD4A0  00180048353 (00000000000, 00000000000, 00000000000, 00000000000)
00000000000  00180048404 (00000000000, 00000000000, 00000000000, 00000000000)
00000000000  7FFB89397974 (00000000000, 00000000000, 00000000000, 00000000000)
00000000000  7FFB8A28A271 (00000000000, 00000000000, 00000000000, 00000000000)
End of stack trace (more stack frames may be present)

sevmonster avatar Dec 10 '19 22:12 sevmonster

I had this error too, but restarting 'WinFsp.Launcher' service worked for me. ^_^

cyraid avatar Nov 03 '20 20:11 cyraid

I had the same error, but reinstalling WinFsp and SSHFS-Win alone did not help for me. After deleting all associated Keys in the Registry and reinstalling both, it works again. At least for a couple of days... :/

mvoelk avatar Jan 23 '21 13:01 mvoelk