usbipd-win
usbipd-win copied to clipboard
USBIPd lockup about once a month
I'm using usbipd between Windows-11 and WSL2-Ubuntu to connect a serial device to WSL2. For the most part it works great, but about once a month it locks up. I have a Python program running that obtains data from the USB connected device every 5 seconds. To recover from this lockup I have to detach/attach (usbipd wsl detach --busid 1-4 / usbipd wsl attach --busid 1-4). And than, after a restart of my program it works again. There is no need to powercycle the device with USB (serial) port. Dates seen a lockup so far (so often far apart):
2023-06-22 15:46
2023-06-26 23:48
2023-08-09 19:15
2023-09-17 15:34
2023-10-10 01:44
I looked at https://github.com/dorssel/usbipd-win/wiki/Troubleshooting and this time I started with trace level logging. I see that the logging is without timestamps, so I wonder if this logging would be helpful for my issue? Any additional advice how I could debug this issue?
Version details: Windows:
systeminfo | findstr /B /C:"OS Name" /B /C:"OS Version"
OS Name: Microsoft Windows 11 Pro
OS Version: 10.0.22000 N/A Build 22000
usbipd --version
3.2.0+38.Branch.master.Sha.7ad7b9904ac0db8d759e3a53a22b1126da01b2be
WSL2:
$ cat /etc/os-release | grep VERSION=
VERSION="22.04.3 LTS (Jammy Jellyfish)"
$ uname -r
5.15.90.1-microsoft-standard-WSL2
$ apt list --installed hwdata linux-tools*
Listing... Done
hwdata/jammy,now 0.357-1 all [installed]
linux-tools-5.15.0-60-generic/now 5.15.0-60.66 amd64 [installed,local]
linux-tools-5.15.0-60/now 5.15.0-60.66 amd64 [installed,local]
linux-tools-5.15.0-86-generic/jammy-updates,jammy-security,now 5.15.0-86.96 amd64 [installed,automatic]
linux-tools-5.15.0-86/jammy-updates,jammy-security,now 5.15.0-86.96 amd64 [installed,automatic]
linux-tools-common/jammy-updates,jammy-security,now 5.15.0-86.96 all [installed,automatic]
linux-tools-virtual/jammy-updates,jammy-security,now 5.15.0.86.83 amd64 [installed]
$ update-alternatives --display usbip
usbip - auto mode
link best version is /usr/lib/linux-tools/5.15.0-60-generic/usbip
link currently points to /usr/lib/linux-tools/5.15.0-60-generic/usbip
link usbip is /usr/local/bin/usbip
/usr/lib/linux-tools/5.15.0-60-generic/usbip - priority 20
USB device/driver: A particulate matter meter (fijnstof meter). https://github.com/msillano/tuyaDAEMON/wiki/custom-device-'PM-detector':-case-study
$ lsusb -s 001:005
Bus 001 Device 005: ID 1a86:7523 QinHeng Electronics CH340 serial converter
Any additional advice how I could debug this issue?
https://github.com/dorssel/usbipd-win/wiki/Troubleshooting also shows how you can create captures (PCAPs). That's the most detailed logging available. You can also capture on the Linux side (usbmon), but it's a bit harder to set up. syslog messages are also helpful.
4.0.0 has been released which fixes a (very small) memory leak. This could explain your issue. Can you confirm?
Thanks for the update. I installed version 4.0.0 on 7-Dec. At that time 3.2.0 was running for 6 wks.
Any news on this?
Hi, Uptime is now: 'active (running) since Wed 2024-01-03 13:45:54 CET; 1 month 1 day ago' (I did had to restart a few times after installing 4.0.0 for other reasons ...)
Are you monitoring the memory usage of the usbipd
service?
Hi, No I was not..., only overall memory usage (which goes up and down due to WSL2). Just added a graph to MRTG for usbipd.exe
Update on a likely unrelated occurrence.
On 2024-02-29 at 20:28 the PM-Monitor stopped working. This was after 1 month and 26 days uptime (started 2024-01-03 13:45:54). This time with a different failure-mode than for which I opened the ticket. I cannot exclude that this got raised due to a connection glitch (e.g. connector accidentally touched). The Win11 event log also show some logs at the time of the issue.
Memory usage looks to increase from 10MB to about 11MB at the moment the issue occurred (2024-02-29 20:26:48 CET). Because the CH340-based serial-USB device was detached from usbipd.exe the setup stopped (no auto restart). After attaching the CH340-based serial-USB device and restarting PM-Monitor process (2024-03-01 19:19:55 CET) I see another memory usage increase from 11MB to 12MB. I have a graph of this ~20% memory usage increase. usbipd.exe wasn't restarted, so maybe it is normal to have a bit more memory in use?
Because I didn't have to restart usbipd.exe for this occurrence I assume it to be a different issue than for which I opened the ticket.
PM-Monitor restart scripts
startUSBipD.bat
@echo off
echo "Starting usbipd..."
:search
tasklist | find "wsl.exe"
IF %ERRORLEVEL% == 0 GOTO :found
TIMEOUT /T 1
GOTO :search
:found
TIMEOUT /T 3
python C:\Users\rhermsen\usbipd_start.py
TIMEOUT /T 30
wsl.exe -u root service pm_monitor restart
echo "Done"
usbipd_start.py
# -*- coding: utf-8 -*-
"""
# execute usbipd at startup
#
# rhermsen
# version 2.0
#
Created on Fri Jun 16 21:20:07 2023
Available commands upto 3.2.0:
usbipd wsl list
usbipd wsl
gives usage info
usbipd wsl list
usbipd wsl attach -h
usbipd wsl detach -h
Available commands from 4.0.0:
usbipd list
usbipd
gives usage info
usbipd list
usbipd attach -h
usbipd detach -h
@author: rhermsen
"""
import subprocess
def executeCliCmd(command):
'''
Parameters
----------
command : string, The full command line, including the parameters to execute.
DESCRIPTION.
Executes a command with optional parameters. The resulting output is returned in a list of strings.
Returns
-------
List of strings where each line is an entry in this list.
'''
p = subprocess.Popen(command, stdout=subprocess.PIPE, shell=True)
out, err = p.communicate()
outString = str(out)
# result = outString.split('\\r\\n')
# print(result)
return outString.split('\\r\\n')
def findBusidAndState(list, match='CH340'):
attached = False
busid = None
for line in list:
wordList = line.split()
if match in wordList:
busid = wordList[0]
if "Attached" in wordList:
attached = True
# print("busid =", busid, "attached", attached)
return busid, attached
list_command = 'usbipd list'
usb_list = executeCliCmd(list_command)
busid, attached = findBusidAndState(usb_list)
attach_command = f'usbipd attach --wsl --busid {busid}'
#print("attach_command:",attach_command)
if busid != None and attached == False:
executeCliCmd(attach_command)
elif attached == True:
print("PM-Monitor USB already attached to WSL2")
else:
print("Could not attach PM-Monitor.")
Logging
WSL
Error messages in journalctl:
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: ch341-uart ttyUSB0: usb_serial_generic_read_bulk_callback - urb stopped: -32
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: vhci_hcd: connection reset by peer
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: vhci_hcd: stop threads
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: vhci_hcd: release socket
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: vhci_hcd: disconnect device
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: usb 1-1: USB disconnect, device number 2
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: usb 1-1: failed to send control message: -19
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: ch341-uart ttyUSB0: ch341-uart converter now disconnected from ttyUSB0
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: ch341 1-1:1.0: device disconnected
similar in dmesg:
[Thu Feb 29 20:27:01 CET 2024] ch341-uart ttyUSB0: usb_serial_generic_read_bulk_callback - urb stopped: -32
[Thu Feb 29 20:27:01 CET 2024] vhci_hcd: connection reset by peer
[Thu Feb 29 20:27:01 CET 2024] vhci_hcd: stop threads
[Thu Feb 29 20:27:01 CET 2024] vhci_hcd: release socket
[Thu Feb 29 20:27:01 CET 2024] vhci_hcd: disconnect device
[Thu Feb 29 20:27:01 CET 2024] usb 1-1: USB disconnect, device number 2
[Thu Feb 29 20:27:01 CET 2024] usb 1-1: failed to send control message: -19
[Thu Feb 29 20:27:01 CET 2024] ch341-uart ttyUSB0: ch341-uart converter now disconnected from ttyUSB0
[Thu Feb 29 20:27:01 CET 2024] ch341 1-1:1.0: device disconnected
Win11 event logs
(sometimes with a few events before the issue, and more after the issue): USBIPd.exe event 2024-02-29 2026.evtx: USBIPd.exe event 2024-02-29 2026.evtx.zip Win Application logs USBIPd.exe event 2024-02-29 2026.evtx: Win Application logs USBIPd.exe event 2024-02-29 2026.evtx.zip Win System logs USBIPd.exe event 2024-02-29 2026.evtx: Win System logs USBIPd.exe event 2024-02-29 2026.evtx.zip
Thanks for the elaborate report.
Memory usage seems correct. usbipd-win
is written in C#, a managed language using a garbage collector. It's normal for a new connection to have a temporary increase in memory consumption. It will take a while before the garbage collector will kick in.
The memory bug that I fixed earlier leaked 40 bytes per URB + a kernel event handle. I could see it in memory consumption clearly after about 100.000 URBs. I guess in your case it was the kernel handles that caused the original crash. I went up to 1 million URBs, but couldn't actually reproduce the crash. Can you guestimate your URB-rate? If that is indeed > 1 million per month it could explain the original crash.
With the current behavior, I think we can safely say the original problem is fixed.
Is it only in one direct this issue was present?
My stab at guestimate the URB-rate: I have a tcpdump capture and assuming each URB ends up in a separate TCP segment I count
- Win > WSL 10 segments from which 1 TCP segment without data (e.g. TCP Ack only) = 9
- WSL > Win 12 segments from which 3 TCP segment without data (e.g. TCP Ack only) = 9
- Same amount of data is requested every 5 seconds
- ( 9 / 5 ) * ( 60 * 60 * 24 * 365) /12 = 4730400 URB/month in each direction
Is it only in one direct this issue was present?
Usually, for each URB there is one USBIP request and one USBIP reply. It also makes sense that a single "poll" from your software consists of several URBs. Of course, when either the request or the reply doesn't fit in the MTU there will be several TCP fragments. But as a "guestimate", the 4 million makes sense.
To comment on your last update: Indeed I see a single poll to be split in several URB's each in its own TCP segment. There is no fragmentation (largest packet is 146 MB only).
Unfortunately there was another similar occurrence as on 2024-02-29.
On 2024-03-07 11:04 the PM-Monitor stopped working in a similar way as on 2024-02-29 at 20:28. This was after only 5 days uptime (started 2024-03-01 19:19:55). This time nobody was near the device. The Win11 event log doesn't show much this time...
Category: Usbipd.ConnectedClient
EventId: 2
Client 192.168.188.87 released device at 1-4 (USB\VID_1A86&PID_7523\5&9BB4271&0&4).
Memory usage was the same as after last week issue, 12MB until I restarted USBIPd.exe. But after I restarted with debug and packet capture (pcap) enabled it jumped to 22MB. (I assume a bit this is due debug and/or a packet capture running).
I hope the capture isn't growing too fast, but from the 'guestimate' it probably will...
I observed another occurrence (2024-05-14, 23:11) after 1 month and 6 days. This time with capturing in place. I was still using usbipd 4.0.0 (capturing timestamps incorrect). This was just before my holiday, so I only had little time to investigate. Now I'm back from holiday.
At the moment the issue starts I see an 'address' change at the server side (windows) from 1.4.2 to 1.4.1 The requests coming from WSL2 (Ubuntu) are still going to 1.4.2. There is a response packet from 1.4.2 followed by 1.4.1 from the server side. All these packets are marked as 'URB transfer type' 'URB_INTERRUPT' in the IN direction. This repeats at every request coming from WSL2, every 5 seconds.
In the mean time I upgraded to USBIPd 4.2.0.
The complete capture is 1.19GB.
I extracted the last 2 correct sequences, and 2 incorrect sequences (named: pm-monitor_20240514__2311__last_successful_and_failed2.pcapng.txt
):
packet 21 to 38 (=18), expected request response
packet 39 and 40 look a bit unexpected
packet 41 to 70 (=30, so much more packets) the request looks good, but response not.
packet 71 to 114 (=44 even more packets) the request looks good, but response not.
and this repeats every 5 seconds
Do you see what goes wrong here? pm-monitor_20240514__2311__last_successful_and_failed2.pcapng.txt
1.4.2 is a bulk endpoint. In Linux terminology, it acts like a block device. You request a read, and you get a reply, always. 1.4.1 is an interrupt endpoint. In Linux terminology, it acts like a character device. You request a read, and it blocks until there is actually something to read., which may be immediately or maybe never at all.
Normally, your device apparently uses 1.4.2. And every IN request completes successfully. Then, in your log line 39. It does not respond. This is the error. A bulk pipe should always respond immediately to a read request. But this time it doesn't. It ends with ETIME, which comes from VBoxUSB where the error code is called USBSUP_XFER_DNR, which it in turn got from Windows where the error code is called USBD_STATUS_DEV_NOT_RESPONDING.
Then your client side driver again submits an IN request for 1.4.2, but it never completes. Besides that, it starts sending requests to 1.4.1.
The root cause is the device not responding (ETIME) in the first place.
If I get it correct it the USB device (the PM-monitor) connected to the Windows machine that is 'misbehaving' by having a timeout. (this is seen from packet 39 having status ETIME (=Timer expired)). It is at the hardware side the timeout occurred, so I'm not able to solve it for future similar events.
For all previous events I never had to power-cycle the PM-Monitor to restore communication (required steps: usbipd detach/attach and restart of my python script). This time I'm not sure as I powered off all involved equipment. Is there a retry mechanism possible, or a workaround to initiate a restart e.g. from WSL2?
If I get it correct it the USB device (the PM-monitor) connected to the Windows machine that is 'misbehaving' by having a timeout.
Looks like it. It is in any case the error that the VBoxUSB driver got from Windows. Normally, when devices get "stuck" it is called a STALL. And the error for that is EPIPE. Client drivers usually handle that by sending a "clear halt" (pipe reset) via channel 0 (the control channel). Maybe (just maybe) that may work for ETIME as well. And I have to admit, the translation for this error to ETIME is mine, based on https://www.kernel.org/doc/html/latest/driver-api/usb/error-codes.html. I/You/We could try to return EPIPE instead, hoping that the driver responds to that with a pipe reset, which just may work (or not)...
Translation happens here: https://github.com/dorssel/usbipd-win/blob/master/Usbipd/Tools.cs#L109
@dorssel If it is possible to have an build with EPIPE returned i.s.o. ETIME that would be great. (I see you did that sometime in the past https://github.com/dorssel/usbipd-win/actions/runs/7308106135?pr=812)
Btw, are there any instructions available to built usbipd from source?
Btw, are there any instructions available to built usbipd from source?
Simply run the commands from the GitHub build script: https://github.com/dorssel/usbipd-win/blob/master/.github/workflows/build-installer.yml
First time I look at GitHub Actions...
I did manage to build an msi-installer, only it is for version 0.2.0 (usbipd-win_0.2.0.msi). Steps done:
- Forked your usbipd-win repo: https://github.com/rhermsen/usbipd-win
- enabled actions for all workflows
- changed and committed line https://github.com/rhermsen/usbipd-win/blob/master/Usbipd/Tools.cs#L115
- Found I also need to change and commit line https://github.com/rhermsen/usbipd-win/blob/master/UnitTests/Tools_Tests.cs#L193
- After a few minutes this resulted in a successful installer: https://github.com/rhermsen/usbipd-win/actions/runs/9800638801
- Uninstalled the current 4.2.0 version, and installed the new build version 0.2.0
But after this I get an exception regarding VBoxUsb driver (see debug log below). But now also after installing 4.2.0 I get the same exception message.
usbipd --version
0.2.0+1052.Branch.master.Sha.716112dce471b303d6de1ddd8ba7273823bd816e
C:\WINDOWS\system32>usbipd server Logging:LogLevel:Default=Trace "usbipd:PcapNg:Path=C:\Users\rhermsen\pm-monitor_20240706__0155.pcapng"
dbug: Microsoft.Extensions.Hosting.Internal.Host[1]
Hosting starting
dbug: Usbipd.PcapNg[1000]
usbipd:PcapNg:Path = 'C:\Users\rhermsen\pm-monitor_20240706__0155.pcapng'
dbug: Usbipd.PcapNg[1000]
usbipd:PcapNg:SnapLength = unlimited
dbug: Usbipd.Server[1000]
usbipd:Port = 3240
dbug: Usbipd.Server[1000]
0.2.0+1052.Branch.master.Sha.716112dce471b303d6de1ddd8ba7273823bd816e
info: Microsoft.Hosting.Lifetime[0]
Application started. Press Ctrl+C to shut down.
info: Microsoft.Hosting.Lifetime[0]
Hosting environment: Production
info: Microsoft.Hosting.Lifetime[0]
Content root path: C:\Program Files\usbipd-win\
dbug: Microsoft.Extensions.Hosting.Internal.Host[2]
Hosting started
dbug: Usbipd.PcapNg[1000]
Flushing
dbug: Usbipd.Server[1000]
new connection from 192.168.188.87
dbug: Usbipd.Server[1000]
connection close: Unable to read beyond the end of the stream.
dbug: Usbipd.Server[1000]
connection closed
dbug: Usbipd.Server[1000]
new connection from 192.168.188.87
dbug: Usbipd.ConnectedClient[1000]
Received opcode: OP_REQ_IMPORT
fail: Usbipd.ConnectedClient[3]
An exception occurred while communicating with the client:
System.IO.FileNotFoundException: Unable to find the specified file.
at Usbipd.ConfigurationManager.GetVBoxDevice(BusId) + 0x128
at Usbipd.VBoxUsb.<ClaimDeviceOnce>d__0.MoveNext() + 0x6d
--- End of stack trace from previous location ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x20
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xb2
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4b
at Usbipd.VBoxUsb.<ClaimDevice>d__1.MoveNext() + 0x145
--- End of stack trace from previous location ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x20
at Usbipd.VBoxUsb.<ClaimDevice>d__1.MoveNext() + 0x202
--- End of stack trace from previous location ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x20
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xb2
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4b
at Usbipd.ConnectedClient.<HandleRequestImportAsync>d__8.MoveNext() + 0xd99
--- End of stack trace from previous location ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x20
at Usbipd.ConnectedClient.<HandleRequestImportAsync>d__8.MoveNext() + 0x1823
--- End of stack trace from previous location ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x20
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xb2
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4b
at Usbipd.ConnectedClient.<RunAsync>d__5.MoveNext() + 0x300
dbug: Usbipd.Server[1000]
connection close: Unable to find the specified file.
dbug: Usbipd.Server[1000]
connection closed
The version number is inconsequential. It is automatically determined from the tags (using GitVersion). Your clone does not have the release tags, so it "restarts" from 0.x. That's not a problem.
Drivers have been bumped, and for development builds that causes issues. Simply: uninstall whatever usbipd-win
you have installed now. And then install another version (your dev build, for example). And if that fails (like you reported), Do a repair install. That should fix any driver versions.
EDIT: And as always with Windows, when things go wrong with driver installs: a reboot in between steps helps ...
I uninstalled, repaired, downgraded/upgraded, rebooted usbipd-win
multiple times, but only got it resolved after uninstalling a VirturalBox USB device via 'Device Manager':
Now it is running with version 0.2.0
Oh well, dev versions .... Good luck with this one.
I guess you had it forced bound?
No, didn't use a 'force' option. Only used the 'usbipd bind --busid 1-4' and 'usbipd attach --wsl --busid 1-4'.
Started from a working usbipd 4.2.0 > tried install usbipd 0.2.0 but this failed due to lower version as the one already installed > uninstall 4.2.0 > (didn't reboot...) installed 0.2.0 (don't think I rebooted) and at this point ran into that issue until I found (after a lot of steps) the 'VituralBox USB' device in de 'Device Manager'.