linux
linux copied to clipboard
USB failing after several hours - xhci_hcd 0000:01:00.0: Ring expansion failed
Describe the bug
Hi, I was hoping someone here may be able to help me.
I've been using a disk image (Buster) that was working fine with RPI4 R1.4
However since the RPI4 R1.5's have started to filter through I've had to update the system to get the disk image to start up. This seems to have caused an issue with my USB FTDI serial adapter dropping off between maybe 8 - 14 hours and requiring either a hardware reboot or an unplug and re-plug of the serial adapter.
A dmesg while the USB has stopped shows that the system is being flooded by clock change errors ie:
[73254.053318] xhci_hcd 0000:01:00.0: Ring expansion failed
[73254.053346] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[73254.125213] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[73254.209306] xhci_hcd 0000:01:00.0: Ring expansion failed
[73254.209328] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[73254.348480] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
From what I've read it seems like the -12 relates to a memory issue, however I'm running 16GB disk image with 53% free and the memory usage is around 444mb/3.33G. So I can't see that being the issue
Here's a few things I've attempted to get it working: Tried on Kernel 5.10, 5.15 both produce the same issue. Set the coherent_pool=4M in config file to see if that helps by mirroring the older firmware I used - Still happens in same timeframe. Updated Bootloader. Changed PI to another Rev1.5 in case it was a board issue Attempted on an older pi Rev1.4 to see if it was newer board issue, with my updated disk image the same issue occurs. Tried to replace the start4.elf and the other boot files from the later disk image to my older disk image, not including kernel, I might not have done this one correctly but I'm assuming they don't play nicely together being mismatched, (it broke things). Rebuilt on new OS (Bullseye) with latest updates/upgrades, last around 8 hours longer before it drops off again.
This is the closest I could find of a similar issue, which doesn't seem to come to any real conclusion: https://github.com/raspberrypi/linux/issues/3479
Oddly enough trying to run a test script with some neopixel LEDs I get the following message:
raise RuntimeError('ws2811_init failed with code {0} ({1})'.format(resp, str_resp)) RuntimeError: ws2811_init failed with code -2 (Out of memory)
In addition to this I've discovered that if I run the following:
sudo sh -c "echo 1 > /sys/bus/pci/devices/0000:00:00.0/remove"
sudo sh -c "echo 1 > /sys/bus/pci/rescan"
As an attempt to get the USB host to reset, seemed to clear what every memory issues was previously there and allowed me to start the pixel test script and also stopped the error spam with the changing of clock frequency, this however did not redetect the FTDI serial adapter that was connected and it dropped off completely.
My latest attempt at resolving this is now using the 64bit Kernel and I will report back on how this goes.
Would appreciate any help or things that I could try to get this resolved or at the very least narrow down why it's happening in the first place. Many Thanks in advance.
Steps to reproduce the behaviour
Unknown. I plug in USB 232 serial adapter and read from via python script, wait 4-48 hours. USB will stop reading and dmesg gets spammed with errors.
Device (s)
Raspberry Pi 4 Mod. B
System
Previous Working Kernel:
Linux pi 4.19.97-v7l+ #1294 SMP Thu Jan 30 13:21:14 GMT 2020 armv7l GNU/Linux
Latest with Issue:
Linux rpi 5.15.32-v7l+ #1538 SMP Thu Mar 31 19:39:41 BST 2022 armv7l GNU/Linux
#lsusb -t
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 480M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
|__ Port 2: Dev 3, If 1, Class=Human Interface Device, Driver=usbhid, 12M
|__ Port 2: Dev 3, If 0, Class=Human Interface Device, Driver=usbhid, 12M
|__ Port 4: Dev 4, If 0, Class=Vendor Specific Class, Driver=ftdi_sio, 12M
#lsusb
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 004: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
Bus 001 Device 003: ID 03eb:214e Atmel Corp.
Bus 001 Device 002: ID 2109:3431 VIA Labs, Inc. Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Logs
[79710.013177] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79710.176265] xhci_hcd 0000:01:00.0: Ring expansion failed
[79710.176397] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79711.196834] xhci_hcd 0000:01:00.0: Ring expansion failed
[79711.197013] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79711.387118] xhci_hcd 0000:01:00.0: Ring expansion failed
[79711.387220] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79711.590789] xhci_hcd 0000:01:00.0: Ring expansion failed
[79711.590814] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79711.626451] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79711.697099] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79712.557077] xhci_hcd 0000:01:00.0: Ring expansion failed
[79712.557252] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79712.735030] xhci_hcd 0000:01:00.0: Ring expansion failed
[79712.735092] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79712.899047] xhci_hcd 0000:01:00.0: Ring expansion failed
[79712.899074] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79712.996485] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79712.998423] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79713.979218] xhci_hcd 0000:01:00.0: Ring expansion failed
[79713.979277] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79714.169234] xhci_hcd 0000:01:00.0: Ring expansion failed
[79714.169269] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79714.336196] xhci_hcd 0000:01:00.0: Ring expansion failed
[79714.336258] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79715.371434] xhci_hcd 0000:01:00.0: Ring expansion failed
[79715.371485] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79715.543992] xhci_hcd 0000:01:00.0: Ring expansion failed
[79715.544072] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79715.719450] xhci_hcd 0000:01:00.0: Ring expansion failed
[79715.719476] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79716.666478] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79716.715792] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79716.743567] xhci_hcd 0000:01:00.0: Ring expansion failed
[79716.743592] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79716.905628] xhci_hcd 0000:01:00.0: Ring expansion failed
[79716.905654] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79717.067880] xhci_hcd 0000:01:00.0: Ring expansion failed
[79717.067921] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79718.041774] xhci_hcd 0000:01:00.0: Ring expansion failed
[79718.041801] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79718.237846] xhci_hcd 0000:01:00.0: Ring expansion failed
[79718.237914] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79718.401208] xhci_hcd 0000:01:00.0: Ring expansion failed
[79718.401309] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79719.438021] xhci_hcd 0000:01:00.0: Ring expansion failed
[79719.438061] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79719.622485] xhci_hcd 0000:01:00.0: Ring expansion failed
[79719.622574] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79719.817889] xhci_hcd 0000:01:00.0: Ring expansion failed
[79719.818039] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79720.826277] xhci_hcd 0000:01:00.0: Ring expansion failed
[79720.826345] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79721.012315] xhci_hcd 0000:01:00.0: Ring expansion failed
[79721.012344] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79721.184273] xhci_hcd 0000:01:00.0: Ring expansion failed
[79721.184323] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79721.656574] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79721.737147] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79722.174484] xhci_hcd 0000:01:00.0: Ring expansion failed
[79722.174511] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79722.338479] xhci_hcd 0000:01:00.0: Ring expansion failed
[79722.338518] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79722.506646] xhci_hcd 0000:01:00.0: Ring expansion failed
[79722.506800] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79723.498602] xhci_hcd 0000:01:00.0: Ring expansion failed
[79723.498629] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79723.668693] xhci_hcd 0000:01:00.0: Ring expansion failed
[79723.668720] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79723.840706] xhci_hcd 0000:01:00.0: Ring expansion failed
[79723.840730] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79724.826872] xhci_hcd 0000:01:00.0: Ring expansion failed
[79724.826997] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79724.996958] xhci_hcd 0000:01:00.0: Ring expansion failed
[79724.997148] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79725.164853] xhci_hcd 0000:01:00.0: Ring expansion failed
[79725.164909] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79726.193101] xhci_hcd 0000:01:00.0: Ring expansion failed
[79726.193128] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79726.379373] xhci_hcd 0000:01:00.0: Ring expansion failed
[79726.379420] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79726.542444] xhci_hcd 0000:01:00.0: Ring expansion failed
[79726.542517] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79726.706640] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79726.816875] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79727.547519] xhci_hcd 0000:01:00.0: Ring expansion failed
[79727.547667] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79727.727380] xhci_hcd 0000:01:00.0: Ring expansion failed
[79727.727508] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79727.891358] xhci_hcd 0000:01:00.0: Ring expansion failed
[79727.891385] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79728.919568] xhci_hcd 0000:01:00.0: Ring expansion failed
[79728.919635] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79729.101772] xhci_hcd 0000:01:00.0: Ring expansion failed
[79729.101800] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79729.257746] xhci_hcd 0000:01:00.0: Ring expansion failed
[79729.257789] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79730.299676] xhci_hcd 0000:01:00.0: Ring expansion failed
[79730.299723] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79730.336683] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79730.337995] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79730.483732] xhci_hcd 0000:01:00.0: Ring expansion failed
[79768.907163] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79769.050667] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-arm frequency: -12
[79773.267203] raspberrypi_fw_set_rate: 8 callbacks suppressed
[79773.267226] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79774.488489] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79774.607247] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79774.661073] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79774.787223] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
Additional context
dtparam=i2c_arm=on
dtparam=spi=on
[pi4]
# Run as fast as firmware / board allows
arm_boost=1
[all]
enable_uart=1
gpu_mem=256
dtoverlay=vpe-2701c
dtoverlay=i2c-rtc,mcp7940x
dtoverlay=disable-bt
Edit: Formatting
This has been cross-posted to the forum here: https://forums.raspberrypi.com/viewtopic.php?t=336947 Setting gpu_mem is unnecessary on a Pi 4, and you should remove it.
Some condition is causing the xhci driver to think that it needs to expand a transfer ring, which it should almost never need to do. This has been seen before (and fixed) in https://github.com/raspberrypi/linux/issues/3919.
Linux pi 4.19.97-v7l+
Linux rpi 5.15.32-v7l+
Which kernel version is running when the out-of-memory errors occur?
Also, please post the results of the commands in https://github.com/raspberrypi/linux/issues/3919#issuecomment-711958318 .
Thanks for coming back to me on this, I very much appreciate it.
I've currently been running stable for almost 4 days after switching to the 64bit version of the kernel. I'm hoping that this has resolved the issue rather than just delayed it.
When I find a moment I'll try and set up an additional device for testing the original problem and following the steps posted in the above link, certainly an interesting read and I apologise for not being able to find it myself!
I've removed the gpu_mem setting and my applications are running as expected on my test unit, so that's great thanks. Must have been a relic from the past!
Linux rpi 5.15.32-v7l+ is the version I was having problems with, I had the same issues with a 5.10 version as well. The 4.19.97 had been working fine for years running the same application.
As an update, I've run the suggested trbs command on the 64bit currently running build and here's the result:
512 /sys/kernel/debug/usb/xhci/0000:01:00.0/devices/03/ep00/trbs 65536 /sys/kernel/debug/usb/xhci/0000:01:00.0/devices/03/ep02/trbs 512 /sys/kernel/debug/usb/xhci/0000:01:00.0/devices/03/ep03/trbs
Looking at this in comparison to the linked thread, the 65536 seems like its bad and has expanded from the 512 that it should be. At this point I'm assuming I'll be expecting the error to reappear in the next day or 2.
Following in the footsteps of the previous solution:
# lsusb:
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 004: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
Bus 001 Device 003: ID 03eb:214e Atmel Corp.
Bus 001 Device 002: ID 2109:3431 VIA Labs, Inc. Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
# sudo lsusb -v -d 0403:
Bus 001 Device 004: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.00
bDeviceClass 0
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 8
idVendor 0x0403 Future Technology Devices International, Ltd
idProduct 0x6001 FT232 Serial (UART) IC
bcdDevice 6.00
iManufacturer 1 FTDI
iProduct 2 USB-RS485 Cable
iSerial 3 FT54KAOF
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 0x0020
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0x80
(Bus Powered)
MaxPower 300mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 255 Vendor Specific Subclass
bInterfaceProtocol 255 Vendor Specific Protocol
iInterface 2 USB-RS485 Cable
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x02 EP 2 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 0
Device Status: 0x0000
(Bus Powered)
If you need any further information, I'm happy to provide.
Do you have a minimal python script that will run on a basic Pi OS install that stimulates the ring expansion?
So after a hefty bit of playing around, it looks as though for the trbs to expand as expected I need the modbus device I'm using actually connected to the usb serial otherwise I can't seem to replicate the issue. I think if I'm going to stand a chance of getting this fixed I'll have to send in the modbus device that i'm using and the usb converter in your direction, is that at all possible?
Haven't been able to test on a basic pi OS just yet, but my build has very minimal surface level changes.
It may be possible to to use the below script to replicate this if you have a modbus device that you can talk to via rs485, unfortunately I don't have enough hardware available to test alternatives at the moment.
This script seems to do the trick to expand the buffer though, just requires:
python3 -m pip install pymodbus
Should blow through the 512 mark quite rapidly, will loop through 100 reads then check the trbs for a change in size then will print out the size and time it was detected.
from pymodbus.client.sync import ModbusSerialClient
from datetime import datetime
from subprocess import check_output
from time import sleep
# Establish port using pymodbus, timeout at 0.01 to speed things up if not connected to modbus device
port = ModbusSerialClient(method='rtu', port="/dev/ttyUSB0", baudrate=19200, timeout=0.01, parity='N', stopbits=1, bytesize=8)
# Read from an address of the modbus device
def read_serial():
try:
address = 34000
val = port.read_input_registers(address=address, count=1, unit=100)
except Exception as e:
print(e)
class ReadTrbs():
def __init__(self):
self.current_trbs_size = 0
self.read_count = 0
def check_trbs(self):
# After 100 loops of reading modbus, check trbs size
if self.read_count >= 100:
self.read_count = 0
trbs_check = check_output(["sudo", "wc", "-l", "/sys/kernel/debug/usb/xhci/0000:01:00.0/devices/03/ep02/trbs"])
trbs_size = int(trbs_check.decode('utf-8').split(" ")[0])
# Check if trbs size has increased since last check then print out if it has.
if self.current_trbs_size != trbs_size:
self.current_trbs_size = trbs_size
print("TRBS Size Updated - Size:{} - {}".format(trbs_size, datetime.now()))
else:
self.read_count += 1
trbs = ReadTrbs()
# Set current TRBS
trbs.check_trbs()
while True:
read_serial()
trbs.check_trbs()
For the mean time as a temporary fix I've thrown together script that's just checking the trbs size and if it exceeds X size then use usbutils to reset the device which also resets the trbs.
Modbus RTU is just a framed serial protocol. I doubt the actual messages matter, just the fact that bytes are being received by the UART. If you have two rs-485 adapters, can you generate messages on a different Pi and cause the ring expansion on the board running the test script?
We have also bumped into this problem using the Modbus protocol and an FTDI serial adapter.
Other than the proposed USB reset, which we are experimenting with right now (hopefully it will work!), does anyone know of any workarounds or adjustments to kernel/driver parameters that might help? We have several application-based ideas (such as reducing message frequency), but I'd be interested in knowing if anyone has any other ideas.
It must be something specific in how the FTDI driver is manipulating its URBs. RS232-mode FTDI serial converters don't have this issue, or at least nobody has reported that they do, and these are far more widely used than RS485 converters.
I wonder if extending the fix from https://github.com/raspberrypi/linux/commit/df3088386622d9682f74c4efebcd2ca71f4528ef to support this device does anything. If I get some time I might try it out, though I doubt it works.
The patch is entirely driver-agnostic (and mentioned earlier in the thread). This doesn't discount the possibility that there might be some other ring condition that triggers expansion that the patch doesn't cater for.
I saw it mentioned earlier in the thread. However, I thought that because of this:
dev->vendor == PCI_VENDOR_ID_VIA && pdev->device == 0x3483
it might be specific for that device. But if you say it is driver agnostic, I believe you.
It must be something specific in how the FTDI driver is manipulating its URBs. RS232-mode FTDI serial converters don't have this issue, or at least nobody has reported that they do, and these are far more widely used than RS485 converters.
We are facing this issue with an 8-port RS232 FTDI serial converter (StarTech ICUSB2328I).
# lsusb
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 012: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
Bus 001 Device 011: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
Bus 001 Device 010: ID 1a40:0201 Terminus Technology Inc. FE 2.1 7-port Hub
Bus 001 Device 009: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
Bus 001 Device 008: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
Bus 001 Device 007: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
Bus 001 Device 006: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
Bus 001 Device 005: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
Bus 001 Device 004: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
Bus 001 Device 003: ID 1a40:0201 Terminus Technology Inc. FE 2.1 7-port Hub
Bus 001 Device 002: ID 2109:3431 VIA Labs, Inc. Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
# lsusb -t
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 480M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
|__ Port 2: Dev 3, If 0, Class=Hub, Driver=hub/7p, 480M
|__ Port 1: Dev 4, If 0, Class=Vendor Specific Class, Driver=ftdi_sio, 12M
|__ Port 2: Dev 5, If 0, Class=Vendor Specific Class, Driver=ftdi_sio, 12M
|__ Port 3: Dev 6, If 0, Class=Vendor Specific Class, Driver=ftdi_sio, 12M
|__ Port 4: Dev 7, If 0, Class=Vendor Specific Class, Driver=ftdi_sio, 12M
|__ Port 5: Dev 8, If 0, Class=Vendor Specific Class, Driver=ftdi_sio, 12M
|__ Port 6: Dev 9, If 0, Class=Vendor Specific Class, Driver=ftdi_sio, 12M
|__ Port 7: Dev 10, If 0, Class=Hub, Driver=hub/7p, 480M
|__ Port 1: Dev 11, If 0, Class=Vendor Specific Class, Driver=ftdi_sio, 12M
|__ Port 2: Dev 12, If 0, Class=Vendor Specific Class, Driver=ftdi_sio, 12M
Example dmesg error log after arbitrary uptime:
# journalctl -k -b -1
[...]
Sep 09 06:58:48 rpi kernel: xhci_hcd 0000:01:00.0: Ring expansion failed
Sep 09 06:58:48 rpi kernel: ftdi_sio ttyUSB6: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
Sep 09 06:58:48 rpi kernel: raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-arm frequency: -12
Sep 09 06:58:48 rpi kernel: raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-arm frequency: -12
Sep 09 06:58:49 rpi kernel: xhci_hcd 0000:01:00.0: Ring expansion failed
Sep 09 06:58:49 rpi kernel: ftdi_sio ttyUSB4: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
Sep 09 06:58:49 rpi kernel: xhci_hcd 0000:01:00.0: Ring expansion failed
Sep 09 06:58:49 rpi kernel: ftdi_sio ttyUSB1: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
Sep 09 06:58:49 rpi kernel: raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-arm frequency: -12
[...]
All ports seem to fail simultaneously. usbreset is not a feasible option for us.
This occured on current stock stable kernel for Raspberry Pi OS 32-bit, and persists after compiling and installing kernel branch rpi-5.19.y (HEAD was 4ba8fe5aeb1914bd0594037e44ae8891b80630c3) which includes df3088386622d9682f74c4efebcd2ca71f4528ef.
I am happy to provide any further information as needed.
How are the ports being used? Do you get the out-of-memory condition if a single port is used? What userspace program is accessing the ports?
A Python app spawns a thread per port with a loop structured like this to check if a serial device is connected:
import serial
import time
while True:
try:
with serial.Serial(interface, baudrate, timeout=1) as ser:
ser.write("PING")
res = ser.readline()
except:
pass
time.sleep(2)
I get the OOM condition even if no device is ever connected. The ports just stop working after an arbitrary amount of time. Everything works fine until then, and devices can connect, communicate and disconnect at will. For context, the same Python application has been deployed to Windows boxes with the same 8-port converters for years without this or similar issue. The system has only used about 100MB of its 4GB RAM when the issue has occurred, if that is of interest.
I have not yet tried with only one thread for a single port.
The python fragment appears to be incomplete. Can you post a self-contained testcase that you know will stimulate the bug?
File "serial.py", line 1, in <module>
import serial
File "/home/pi/serial.py", line 5, in <module>
with serial.Serial(interface, baudrate, timeout=1) as ser:
AttributeError: module 'serial' has no attribute 'Serial'
Of course. I will construct one, wait for the error to occur to verify, and edit this comment. EDIT: Successfully recreated the issue with the following python script "test.py", only targeting a single port. The port in question never has a physical device connected.
import serial
import time
import os
while True:
try:
with serial.Serial('/dev/ttyUSB0', 38400, timeout=1) as dev:
dev.write('PING\n'.encode('utf-8'))
response = dev.readline()
except (serial.SerialException) as e:
if str(e).find('Cannot allocate memory'):
print('OOM error has occured. Exiting')
os._exit(1)
pass
time.sleep(0.6)
The error you got from your fragment looks like a naming collision between module "serial" and your filename "serial.py". But to be sure, this is my environment:
# sudo apt install python3 python3-serial
# python3 --version
Python 3.9.2
# python3 test.py
OOM error has occured. Exiting
(after 12-18 hours). Like @Jorl17 this does lots of opens and closes. As per https://github.com/raspberrypi/linux/issues/3919#issuecomment-711958318 here are the trbs afterward:
root# for d in /sys/kernel/debug/usb/xhci/0000:01:00.0/devices/*/*; do if [ -d "$d" ]; then cd $d; echo "${d/?*\/devices\//}: $(wc -l trbs)"; fi done
01/ep00: 512 trbs
01/ep02: 512 trbs
02/ep00: 512 trbs
02/ep02: 512 trbs
03/ep00: 512 trbs
03/ep02: 32768 trbs
03/ep03: 512 trbs
04/ep00: 512 trbs
04/ep02: 512 trbs
04/ep03: 512 trbs
05/ep00: 512 trbs
05/ep02: 512 trbs
05/ep03: 512 trbs
06/ep00: 512 trbs
06/ep02: 512 trbs
06/ep03: 512 trbs
07/ep00: 512 trbs
07/ep02: 512 trbs
07/ep03: 512 trbs
08/ep00: 512 trbs
08/ep02: 512 trbs
08/ep03: 512 trbs
09/ep00: 512 trbs
09/ep02: 512 trbs
10/ep00: 512 trbs
10/ep02: 512 trbs
10/ep03: 512 trbs
11/ep00: 512 trbs
11/ep02: 512 trbs
11/ep03: 512 trbs
I have no expertise regarding what might cause this error in either of the USB drivers (or hardware), but stumbled upon this and couldn't tell from the current implementation if this reset is present.
For reference, this is also our scenario. We have a single serial port (bar the ones we have for an LTE modem) where we are running commands very frequently, and, as they are run, the trbs size just keeps on increasing.
Lowering the frequency of updates has a very clear impact. We changed it from 0.2s to 1s and the trbs is not growing so fast -- though it is certainly growing.
One additional situation we have is that our serial port is being opened and closed very fast, instead of being kept open all the time. Every time we send or receive data, we're opening it and closing it.
usbreset doesn't seem to fix when applied to a single port (using 3-digit bus and device number - BBB/DDD - as argument). Would introducing a separate USB hub change anything - if the issue is VL805-related?
usbresetdoesn't seem to fix when applied to a single port (using 3-digit bus and device number - BBB/DDD - as argument). Would introducing a separate USB hub change anything - if the issue is VL805-related?
No. The device context needs to be destroyed and reallocated for the leak to be reset.
I can replicate the leak with the script posted above (note: github doesn't ping email notifications on comment edits). The first expansion happens in about 2-3 minutes, the second after 10+, so it's either random or the requisite interval doubles each time.
The bug is effectively random, but only occurs immediately after a Link TRB is traversed. I get this odd condition in room_on_ring():
xhci_hcd 0000:01:00.0: room_on_ring B num_trbs_free 248 num_trbs 1 deq_seg 248
Somehow a) we have half the TRBs that we should have in the condition check, and b) the driver thinks there's 248 TRBs in a dequeue segment (the TRBs forming the current TD).
Edit: the link TRB appears to be coincidental. There's a steady leak of num_free_trbs each time we stop and restart the bulk IN endpoint, which eventually breaks the condition check.
Spent several hours looking at trace logs and got mightily confused until I realised there's two "bugs" happening simultaneously.
usb-serial submits multiple URBs for the IN endpoint and this is the primary cause of the leak. In addition to this, there's a race between URB dequeue and resubmission - while the port is being closed, there's nothing stopping the callback from resubmitting the next URB in sequence, but this doesn't result in a leak when the next URB gets dequeued from the transfer ring.
When the port is shut down, the first dequeue request must stop the endpoint ring because the IN urb has been claimed by the xhci hardware. This process happens asynchronously, so other URBs can sneak in afterwards (and ring the endpoint doorbell, undoing the stop-endpoint command). Eventually after 1-2 Stop Endpoint commands the state ends up consistent with no more reubmissions.
Somewhere in the first call to Stop Endpoint and Set TR Dequeue Pointer is where the leak happens. It doesn't take into account the fact that if the dequeue pointer is moved past all of the TDs on the ring (on to the first TRB with a cycle state bit != the consumer cycle state), it needs to return those TRBs in the process.
If I change the number of URBs here: https://github.com/raspberrypi/linux/blob/rpi-5.15.y/include/linux/usb/serial.h#L92 then the bug shifts. A value of 1 prevents a leak, a value of 8 makes the leak happen faster.
Oh and the bug isn't specific to the VL805. The dwc3 xhci controller on the Pi's USB-C port is affected as well.
One obvious bug was in xhci_move_dequeue_past_td() where the ring state was advanced but not accounted for. With this fixed, I still get occasional drops across device open/close which makes me think there's a genuine race condition happening. In particular, I'm suspicious of td->cancel_status which attempts to keep track of whether the xHC has read (and cached) the TD. It has several states, touched in many places in the driver.
I think I've found the cause - there's a large number of gotchas when a URB is removed that is ahead of the ring's current dequeue pointer, i.e. the xHC hasn't yet processed the TD associated with it. This frequently happens if you have a mostly-idle endpoint such as a serial port. Removing an un-processed TD invalidates it by turning it into a no-op instead of moving the dequeue pointer (which was the case above), but doesn't account for the TRBs that got binned.
Strictly speaking, the space isn't "free" because if you subsequently moved the dequeue pointer to the end of the current cycle, or added more TDs before restarting the ring, the free count vs the actual pointer position would diverge.
@Jorl17 @mhartoft can you try compiling from this branch: https://github.com/P33M/linux/tree/trb_leak_fix
and testing with your hardware. Here, on either the vl805 or dwc3 controllers on a Pi 4, I no longer get stray num_trbs_free counts across 1, 2, 8 or 56 usb-serial bulk URBs pending.
Do you know if I can get away with just compiling some specific modules so as to not fully recompile the kernel? From an operational perspective it will be difficult
xhci_hcd is built-in, so /boot/kernelN.img has to be updated, and the machine has to be rebooted. It goes without saying that you shouldn't test this on a board you care about accessing remotely.
@Jorl17 @mhartoft can you try compiling from this branch: https://github.com/P33M/linux/tree/trb_leak_fix
Yes, I am on it
There is no leak detectable in shorter timeframes where it would definitely show without the patch. Will leave it running and continually monitor over the weekend. The ports are functional, and serial communication is flowing 🎉
@P33M Thank you very much for your hard work!
From what you said it seems like this affected all kinds of serial ports, right?
While we noticed this with the FTDI adapter, we also see it (albeit much slower) with a serial associated to an LTE modem.
Thanks again!
The bug is exercised by a driver indirectly manipulating endpoint ring state, and is not limited to a particular class of driver. For example one that frequently issues queued commands over a bulk or control pipe that end in stall responses would also hit the bug.