illumos-joyent
illumos-joyent copied to clipboard
panic on boot with "XHCI runtime reset required"
Upgraded recently to 20170303 and then to 20170315, but on boot and before the zvol is up it seems (based on my attempt to do a systemdump) I get a panic.
WARNING: xhci1: abort command timed out: resetting device
panic[cpu2]/thread=ffffff001ea81c40: XHCI runtime reset required
Warning - stack not written to the dump buffer
ffffff001ea81b60 xhci:xhci_soft_state+37c103b2 ()
ffffff001ea81c20 genunix:taskq_thread+2d0 ()
ffffff001ea81c30 unix:thread_start+8 ()
(the above is typed by hand, as I unfortunately have no serial on this system. typos are all my own)
I received some help from bahamat in #smartos who suggested filing here. I'll see if I can get some more info out of kmdb on what's in scope, etc.
Update: poking around with my limited mdb, I don't see that thread on any of the CPUs any longer. Let me know if there is any info I can gather from the system to help with diagnosis. Thanks in advance.
The most useful starting point is to run the following kinds of commands:
::stacks -m xhci
::stacks -m usba
::prtusb
Can you also provide any info about what kind of system you're using?
The system is an Intel X79 chipset that I assembled specifically to run SmartOS years ago. It's a Gigabyte X79-UD5.

I believe only a USB 3 port has a USB 2 drive attached to it: the boot media.
I also grabbed ::vars from the thread address. Let me know if there's anything you need there.
The BIOS had a setting for XHCI handoff, which was enabled. I disabled it, but no change. I was also able to disable XHCI though, which let me work around it for now. I'd still like to help you get to the bottom of it if useful though.
We definitely want to get to the bottom of this. Two things that'll be useful here, could you run the ::prtusb when the system has xhci disabled, just so we can compare. From there, the next thing that's going to be useful is the next time this happens, run the same ::stacks -m xhci and then take the thread address that it displays and run ::findstack -v. So in this case we'd run ffffff001e928c40::findstack -v. Note, the thread address will almost certainly end up changing on the next boot, so you can't use the exact address I have there verbatim. This'll help tell us what it's hanging on trying to enable and then we can start figuring out what's going on.
Great, will get you that info when I am able. It'll probably be later this evening.
So, interesting observation just now… there's no problem if USB3/XHCI is enabled in the BIOS but the boot device is on the USB 2.0 controller.
Booting with it on the USB3 just to get info…

And then booting with xhci enabled and the boot media plugged into USB 2…
# mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci ufs ip hook neti sockfs arp usba xhci mm stmf_sbd stmf zfs sd lofs idm sata random cpc logindmux ptm sppp nfs ]
> ::prtusb
INDEX DRIVER INST NODE GEN VID.PID PRODUCT
1 xhci 0 pci1458,5007 3.0 0000.0000 No Product String
2 xhci 1 pci1458,5007 3.0 0000.0000 No Product String
3 ehci 0 pci1458,5006 2.0 0000.0000 No Product String
4 ehci 1 pci1458,5006 2.0 0000.0000 No Product String
5 hubd 0 hub 2.0 8087.0024 No Product String
6 hubd 1 hub 2.0 8087.0024 No Product String
7 hid 0 keyboard 1.0 0430.0005 No Product String
8 usb_mid 0 device 1.1 0cf3.3008
Bluetooth USB Host Controller
9 scsa2usb 0 storage 2.0 058f.6387 Mass Storage
Note I booted under mdb, but since I didn't crash I just grabbed it from multiuser.
I'd booted from this same USB drive on older SmartOS releases in a USB3 port, so something has changed.
And… of course… there's no irony at all in the USB drive I selected:

@rmustacc any additional info I can assist with? no rush from my perspective, but just wanted to be sure you have what you need.
@rmustacc anything I can do to help diagnose and/or any fixes? thanks in advance
Updated to 20211216T012707Z. Booting with OS media in USB 2 and a USB 3 attached disk on one of the USB 3.0 ports, I continue to see…
WARNING: xhci1: abort command timed out: resetting device
panic[cpu0]/thread=fffffe005c287c20: XHCI runtime reset required.

OCR'd: [8]>::stacks -m xhci
fffffe885c14cc28 SLEEP CV sutch+0x133 cv_wait +8x68 xhci xhci_command_submit+0x12b xhci xhci_command_enable_slot+8x4e xhci xhci_hcd i_device_init+0x1b3 usba hubd_create_child+0x243 usba hubd_handle_port_connect+0x482 usba hubd_hotplug_thread+8x3d3 taskq_d_thread+8xbc thread_start+0xb
[8]> fffffe885c14cc28: :findstack -v stack pointer for thread fffffe885c14cc28 (tq:system_taskq): fffffe805c14c630 [ fffffe805c14c630 _resume_from_idle+0x12b() fffffe885c14c668 sutch+8x133() ] fffffe805c14c6a8 cv_uait+0x68(fffffe805c14c728, fffffe430b9672f8) fffffe805c14c6f8 xhci xhci_command_submit+0x12b (fffffe438b966080, fffffe805c14c710) fffffe805c14c778 xhci xhci_command_enable_slot+8x4e (fffffe438b966800, fffffe43197bd012) fffffe805c14c878 xhci xhci_hcdi_device_init+8x1b3(fffffe4319638a88, 3, fffffe805c14c948) fffffe885c14ca18 usba hubd_create_child+8x243(fffffe43153326a8, fffffe4318b895c8, fffffe4318e68b88, 4, 3, 8) fffffe805c14cabo usba hubd_handle_port_connect+0x482(fffffe4318b895c8, 3) fffffe805c14cb60 usba hubd_hotplug_thread+0x3d3(fffffe4318de9ac8) fffffe805c14cc00 taskq_d_thread+8xbc (fffffe4315f53720) fffffe805c14cc18 thread_start+8xb()
@rmustacc any details I can get to understand this issue better? I'm glad to go in and get some additional information or pair up and do so if it'd help. Feel free to give me some pointers to what source and what kind of poking around with mdb would be of help.
Sorry to make you reproduce this again, but seeing the function stack arguments via$C from kmdb on the actual panicking thread will help correspond what threads are doing what.
@danmcd no worries at all-- glad to repro it as many times as needed to try to fix it. I'll get some more info here soon and report back.
NOTE: I'm upstreaming https://www.illumos.org/issues/14464 to make whatever we find here a completely a generic illumos fix (not that I think 14464's code from SmartOS is causing this problem... it just eliminates all doubt if we upstream it).