sshfs-win icon indicating copy to clipboard operation
sshfs-win copied to clipboard

sshfs stuck in FspFileSystemRemoveMountPoint

Open i3v opened this issue 3 years ago • 2 comments

The sshfs-win drive works great for a while, but hangs after ~24 hours or so.

The network is somewhat stable - at least the putty session to the same server is still active (a few weeks already). But there's a chance that underlying nfs->zfs connection is experiencing some issues time to time and who-knows-what-else.

The symptoms I see on Windows side are:

  • The net use T: still recognizes the connection as active. But I cannot cd there:

    c:\>net use T:
    Local name        T:
    Remote name       \\sshfs.r\user@host
    Resource type     Disk
    The command completed successfully.
    
    
    c:\>T:
    Insufficient system resources exist to complete the requested service.
    
    c:\>
    
  • If I try to open new Windows Explorer (e.g. Win+E) it just hangs.

  • Clicking on "T:" icon in a "Save As" dialog in the Windows "Snipping Tool" results in:

    [Window Title]
    Location is not available
    
    [Content]
     T:\ is not accessible.
    
    Insufficient system resources exist to complete the requested service.
    [OK]
    
  • The sshfs.exe is using 2 CPU cores to 100% with 2 active threads:

    • winfsp-x64.dll!FspFileSystemRemoveMountPoint+0xa0 Capture

    • ntdll.dll!RtlReleaseSRWLockExclusive+0x40

      ntoskrnl.exe!KeSynchronizeExecution+0x5c26
      ntoskrnl.exe!KeWaitForSingleObject+0x12e6
      ntoskrnl.exe!KeWaitForSingleObject+0xadb
      ntoskrnl.exe!KeWaitForSingleObject+0x1ff
      ntoskrnl.exe!ExWaitForRundownProtectionRelease+0x9fa
      ntoskrnl.exe!KeWaitForSingleObject+0x31cb
      ntoskrnl.exe!KeSynchronizeExecution+0x2e02
      ntdll.dll!NtDelayExecution+0x14
      KERNELBASE.dll!SleepEx+0x9a
      cygwin1.dll!strtosigno+0x305
      cygwin1.dll!sigfillset+0xa7f5
      cygwin1.dll!sigfillset+0xaa88
      cygwin1.dll!_main+0x4c5
      cygwin1.dll!_main+0x502
      cygwin1.dll!strtosigno+0x354
      cygwin1.dll!sigfillset+0xa7f5
      cygwin1.dll!sigfillset+0xaa88
      cygwin1.dll!_main+0x4c5
      cygwin1.dll!_main+0x502
      cygwin1.dll!setprogname+0x2c21
      cygwin1.dll!setprogname+0x411e
      cygwin1.dll!setprogname+0x41d4
      KERNEL32.DLL!BaseThreadInitThunk+0x14
      ntdll.dll!RtlUserThreadStart+0x21
      

    I might be completely wrong, but this looks a bit like WinFSP is trying to re-mount the disk, like discussed here, even though I've never tried to add Recovery DWORD yet (not sure if it's already 1 by default in the version I use).

Note that I use OSFMount v3.1 (1000) to mount the "*.img" that is stored on T:\. AFAIU, that's the only opened file on T:\. If I click "dismount" there, it infinitely hangs on "Notifying applications that device is being removed..." message (Not sure if that's the actual reason).

  • Moreover, if I try to kill "OSFMount.exe" process with Sysinternals Process Explorer, I get:

    ---------------------------
    Process Explorer
    ---------------------------
    Error terminating process: Access is denied.
    
    ---------------------------
    OK   
    ---------------------------
    

    Thus, sadly, I was unable to test if this is what blocks sshfs.exe.

  • OSFMount does not behave like that, when WinFsp is not involved. Although I cannot say I have a lot of experience with it, I tried to reproduce the "unable to read a file" condition for it. It does behave a bit weird and seem to not explicitly react to inability to read the disk image file anyhow, but there's nothing like hanging Windows Explorer.

What actually helps is restarting "WinFsp.Launcher" service. Everything immediately goes back to normal (e.g. I'm able to start Windows Explorer again). Re-mounting >net use T: \\sshfs.r\user@host again and OSFMount works OK.

OS version and build: Windows 10 version 1803, 17134.1304 WinFsp version and build: 1.8.20304 , sshfs-win-3.5.20357-x64

I should probably try to update all these versions and try to reproduce this behavior. Hope I would have a chance to do this in a while.

i3v avatar Apr 10 '21 19:04 i3v

Apologies for the late response.

I looked into this issue, but unfortunately I do not have a good answer for you. However I doubt that the problem is really in FspFileSystemRemoveMountPoint: the relevant stack trace looks completely wrong to me.

billziss-gh avatar May 22 '21 19:05 billziss-gh

I'm very sorry for such a late response as well. Thanks for looking into the issue!

  • After all, I'm now on winfsp-1.9.21096.msi, sshfs-win 3.5.20357, Windows 10 version 20H2, for a few months already. Effectively the same issue still happens time to time.

  • Now, I guess that OSFMount is only partially related to the issue. For now, when sshfs-win "gets stuck" the Windows Explorer does not hang on startup (maybe OSFMount was blocking it somehow, because sshfs-win was blocking it). And the stack is somewhat different. But the overall "sshfs-win is stuck" effect is the same.

  • I'm still not sure if AV is somewhat involved here.

I found a way to reproduce this (or, at least, somewhat similar) issue within a few minutes with 100% repeatability. I start 100 processes (Matlab parallel pool, in this case) that read images (~10MB files) in parallel from the sshfs-win drive (just ~200Mbit/s total througput). Initially, freshly started sshfs.exe process only consumes 7.6MB of RAM. Once these processes start to work, RAM usage linearly goes up together with the total "Handles" counter. Once RAM usage reaches 1060MB ("Handles" counter reaches 16711680 at this point), sshfs-win hangs and it's network activity goes to zero.

RAM use - graph The "RAM use" tab in Process Hacker shows a large number of 1MB blocks allocated: RAM use - allocs

If I pause or stop the reader processes (just a normal stop, not killing them abruptly) before sshfs-win hags (e.g. at ~500MB RAM usage), the RAM usage does not go down. Thus, I guess this is just a leak. Not sure what affects its severity. I'm not reading too many files here, way below 100k files before it hangs.

When I copy the same files to my local HDD nothing wrong happens - the sshfs.exe RAM usage is stable ~43MB, "Handles" counter is not growing that fast (but it still seem to constantly glow). Thus, the leak severity probably depends on how application reads the file.

i3v avatar Oct 05 '21 01:10 i3v