linux
linux copied to clipboard
Rpi goes into kernel panic once LTE USB Dongle is disconnected
Describe the bug
Once LTE USB Dongle is disconnected "nonzero urb status received: -71, wdm_int_callback - 0 bytes" errors are thrown indefinitely. Raspberry Pi is frozen and it's not possible to reach it until the system is power cycled.
Steps to reproduce the behaviour
- Connect a USB Dongle to Raspberry Pi 3 Model B+ to any USB
- Wait until it's connected to the Internet (connection is established by using ModemManager)
- Disconnect the USB Dongle
Device (s)
Raspberry Pi 3 Mod. B+
System
OS and version: Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, fa45ccf5a4b183ee566b36d74fb4b65bf9358bed, stage2
Firmware version: Dec 1 2021 15:07:06 Copyright (c) 2012 Broadcom version 71bd3109023a0c8575585ba87cbb374d2eeb038f (clean) (release) (start)
Kernel version: Linux aqueduct 5.10.103-v7+ #1529 SMP Tue Mar 8 12:21:37 GMT 2022 armv7l GNU/Linux
LTE USB Dongle: Alcatel IK41VE1
Logs
kernel: [ 70.955000] cdc_mbim 1-1.2:1.3: nonzero urb status received: -71
kernel: [ 70.957710] cdc_mbim 1-1.2:1.3: wdm_int_callback - 0 bytes
kernel: [ 71.058078] cdc_mbim 1-1.2:1.3: nonzero urb status received: -71
kernel: [ 71.060809] cdc_mbim 1-1.2:1.3: wdm_int_callback - 0 bytes
kernel: [ 71.602750] ERROR::dwc_otg_hcd_urb_enqueue:501: Not connected
kernel: [ 71.609973] cdc_mbim 1-1.2:1.3: Tx URB error: -19
kernel: [ 71.689118] ERROR::dwc_otg_hcd_urb_enqueue:501: Not connected
kernel: [ 71.696328] option1 ttyUSB0: usb_wwan_write: submit urb 0 failed: -19
kernel: [ 71.707848] ERROR::dwc_otg_hcd_urb_enqueue:501: Not connected
kernel: [ 71.715463] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -19
kernel: [ 71.718163] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
kernel: [ 71.720804] smsc95xx 1-1.1:1.0 eth0: __smsc95xx_mdio_read: MII is busy
kernel: [ 72.462173] option1 ttyUSB0: usb_wwan_write: submit urb 0 failed: -19
kernel: [ 72.579529]
kernel: [ 72.579553] ERROR::dwc_otg_hcd_urb_enqueue:501: Not connected
kernel: [ 72.579553]
kernel: [ 72.591639] smsc95xx 1-1.1:1.0 eth0: Failed to read
reg index 0x00000114: -19
kernel: [ 72.595558] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
kernel: [ 72.599452] smsc95xx 1-1.1:1.0 eth0: __smsc95xx_mdio_read: MII is busy
Aug 2 09:23:32 aqueduct kernel: [ 71.802033] ------------[ cut here ]------------
Aug 2 09:23:32 aqueduct kernel: [ 71.804794] WARNING: CPU: 0 PID: 48 at drivers/net/phy/phy.c:958 phy_error+0x30/0x70
Aug 2 09:23:32 aqueduct kernel: [ 71.807587] Modules linked in: hci_uart btbcm bluetooth ecdh_generic ecc libaes vc4 cec 8021q garp stp llc drm_kms_helper drm drm_panel_orientation_quirks brcmfmac brcmutil snd_soc_core snd_compress snd_pcm_dmaengine syscopyarea sysfillrect sysimgblt fb_sys_fops sha256_generic libsha256 raspberrypi_hwmon backlight cfg80211 rfkill bcm2835_codec(C) bcm2835_v4l2(C) bcm2835_isp(C) v4l2_mem2mem snd_bcm2835(C) bcm2835_mmal_vchiq(C) videobuf2_vmalloc videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common snd_pcm snd_timer videodev snd vc_sm_cma(C) mc cdc_mbim cdc_wdm option cdc_ncm usb_wwan usbserial cdc_ether fixed uio_pdrv_genirq uio ip_tables x_tables ipv6
kernel: [ 71.826806] CPU: 0 PID: 48 Comm: kworker/0:3 Tainted: G C 5.10.103+ #1529
kernel: [ 71.830315] Hardware name: BCM2835
kernel: [ 71.833764] Workqueue: events_power_efficient phy_state_machine
kernel: [ 71.837233] Backtrace:
kernel: [ 71.840735] [
Additional context
Expected behaviour: Raspberry Pi continues to function normally.
Note: The issue is also reproducible sometimes during system reboot. (after calling: sudo reboot now)
LTE dongles frequently consume large amounts of power when data connections are active. What happens if you use a self-powered USB hub between the dongle and the Pi?
What is the output of vcgencmd get_throttled when the dongle is active?
@P33M thank you for the quick response!
If I use a self-powered USB hub the behavior is the same.
The result of vcgencmd get_throttled is following:
throttled=0x0
Please post the full output of sudo lsusb -v with the device plugged in (does not have to be active).
Please see the output in the file. usbs_info.txt
Interesting finding that the issue is not reproducible with 64 bit Raspberry Pi OS Lite (Release date: April 4th 2022) version, but still reproducible with 32 bit Raspberry Pi OS Lite (Release date: April 4th 2022).
Hi @IrynaSemenovych,
Can you try the DWC driver for armv7 and see if it fixes the issue. IIRC, this is enabled using the following in config.txt
dtoverlay=dwc2
This may however, cause other issues elsewhere, but worth trying.
Hi @JamesH65
Unfortunately enabling the DWC driver for armv7 hasn't fixed anything.
That is interesting, AIUI, its a completely different driver which doesn't use the FIQ, so that implies that its not the low level USB driver causing the issue, or the custom FIQ code we use to improve the standard USB.
I presume the error you see is exactly the same?
Please pose a full dmesg log in the dwc2 case.
@JamesH65 @P33M Sorry that I have not described the full behavior. I have tested it with a different setup now and here the results and logs:
1.dtoverlay=dwc2 + LTE Modem LTE USB Modem doesn't connect to the network, no kernel panic when USB Modem is disconnected. dmesg_without_eth.txt
2.dtoverlay=dwc2 + LTE Modem + LAN cable LTE USB Modem connects to the network and the connection seems stable, when the USB Modem is disconnected - no kernel panic, rpi functions well. dmesg_with_eth.txt
- dtoverlay=dwc2 + raspberry pi zero + LAN9514 chip USBs don't work at all, seems like they are not powered (LED on the modem is off, no reaction to keyboard). Get it working on RaspberryPi Zero is quite essential for our case.
So the results are different depending on the setup, not sure what influences it.
1 & 2 -
[ 8.314512] Under-voltage detected! (0x00050005)
...
[ 14.554409] Voltage normalised (0x00000000)
You have an undervoltage event during boot in both cases, which means your power supply is marginal. USB symptoms include spontaneous disconnects as well as unexplained unreliability when power is flaky. Change the power supply (or micro-USB cable, if using a PSU without a captive cable) for one that doesn't produce an undervoltage message on boot.
Using dwc2 vs dwc_otg and not getting a crash with dwc2 could indicate a bug during disconnect processing that somehow causes a root port disconnect, but in your use-case you have a single high-speed device and there won't be much benefit to using dwc_otg which has specific optimisations for full- and low-speed devices. I recommend using dwc2.
- On Pi Zero boards dwc2 defaults to otg mode. If you don't use a cable that shorts OTGID to ground then the Zero's USB port will be in device mode. Use the line
dtoverlay=dwc2,dr_mode=host* to force host.
- edit - the line is in /boot/config.txt.
@P33M Thank you for the recommendations, please check out the results of my testing.
- Proper power supply, dtoverlay=dwc2 + LTE Modem
I changed the power supply for the rpi 3 Model b and LTE modem still doesn't connect to the network it's continuously blinks blue (that indicates that it's trying to connect to the 4G network). dmesg_rpi_3modelB_dwc2.txt
- dtoverlay=dwc2,dr_mode=host + raspberry pi zero + LAN9514 chip
LTE USB Dongle doesn't connect the the network, continuously blinks. dmesg_rpi_zero_dwc2_host.txt
In both cases I noticed such error:
[ 502.472278] dwc2 20980000.usb: dwc2_hc_chhltd_intr_dma: Channel 4 - ChHltd set, but reason is unknown
[ 502.472303] dwc2 20980000.usb: hcint 0x00000002, intsts 0x04600001
Maybe it will help in the issue resolving.
Is there any other tips & tricks that I should test?
The lines either side of the error are telling.
[ 501.781538] usb 1-1.2.3: new high-speed USB device number 10 using dwc2
[ 502.020277] dwc2 20980000.usb: dwc2_hc_chhltd_intr_dma: Channel 3 - ChHltd set, but reason is unknown
[ 502.020306] dwc2 20980000.usb: hcint 0x00000002, intsts 0x04600001
[ 502.020319] dwc2 20980000.usb: dwc2_update_urb_state_abn(): trimming xfer length
[ 502.021267] dwc2 20980000.usb: dwc2_hc_chhltd_intr_dma: Channel 0 - ChHltd set, but reason is unknown
[ 502.021293] dwc2 20980000.usb: hcint 0x00000002, intsts 0x04600001
[ 502.021306] dwc2 20980000.usb: dwc2_update_urb_state_abn(): trimming xfer length
[ 502.022279] dwc2 20980000.usb: dwc2_hc_chhltd_intr_dma: Channel 5 - ChHltd set, but reason is unknown
[ 502.022304] dwc2 20980000.usb: hcint 0x00000002, intsts 0x04600001
[ 502.022316] dwc2 20980000.usb: dwc2_update_urb_state_abn(): trimming xfer length
[ 502.031706] usb 1-1.2.3: unable to read config index 0 descriptor/all
[ 502.031752] usb 1-1.2.3: can't read configurations, error -71
In both cases, you have repeated device disconnects. In the Pi Zero case, you are getting a disconnect before the kernel even attempts to load the driver for the device. I suggest verifying with an oscilloscope that Vbus remains within tolerance (5V +-5%) at the device's USB connector.
The reason why it gets disconnected is in our custom script that resets the LTE Dongle periodically in case it doesn't have an Internet connection.
@P33M I tested it one more time without our custom script and device doesn't disconnect. Should we still try to verify Vbus at the device's USB? or is there any other suggestions? Thank you in advance!
I am hitting the same issue (albeit on a different kernel/distro). Was there a definitive fix for your issue?
I am assuming the dwc2 errors:
[ 502.022279] dwc2 20980000.usb: dwc2_hc_chhltd_intr_dma: Channel 5 - ChHltd set, but reason is unknown
[ 502.022304] dwc2 20980000.usb: hcint 0x00000002, intsts 0x04600001
did not stop appearing, even after the custom script resetting the dongle periodically was removed.
I am facing similar Problems as Iryna. I am using Raspberry PI CM3 with blank PiOS. A Quectel Modem is connected via Modem Manager and libqmi.
When resetting the modem the OS continuously prints error messages and becomes unresponsive. The only way to leave is a powercycle. The error messages are:
My observation is that this only occurs when using 32bit OS. On PiOS 64bit the error messages do also appear but after a few prints the stop and the system works fine again. So no system halt. I also observed that having an active ethernet connection on 32bit leads to the same error messages but there is a timeout in place similar to the 64bit OS.
Are there any other ideas on how to fix this behavior other than using DWC2? Is anyone having experience of potential side effects of using DWC2?