qubes-issues icon indicating copy to clipboard operation
qubes-issues copied to clipboard

sys-usb sometimes fails to start

Open hackerfactor opened this issue 3 years ago • 37 comments

Qubes OS version

R4.1beta /etc/qubes-release: Qubes release 4.1 (R4.1)

Names and locations of enabled testing repos

Names of affected testing repo packages

Affected component(s) or functionality

sys-usb, mouse


Brief summary

This appears to be a race condition. It does not happen all the time.

  1. Install Qubes 4.1 (Qubes-R4.1.0-beta1-x86_64.iso).
  2. During the install, I deselect Whonix (don’t need it for my use model) and I toggle sys-firewall and sys-usb to NOT be disposible. (For my needs, I customize sys-usb.)
  3. Reboot. sys-usb does not consistently start up.
  4. Open Qube Manager. Select settings for sys-usb, toggle start on boot (was checked, uncheck it, then recheck it). Click “Apply” and “OK”. From this point on, sys-usb appears to start properly during boot.

The results that I see: Most of the time (about 80%), sys-usb starts properly. About 15% of the time, sys-usb appears to be running, but the mouse (touch pad on laptop) does not respond. Using keystrokes, I can get to the shutdown menu and reboot the system. About 5% of the time, even the keyboard doesn't work. In this case, plugging in USB devices does not even trigger any notification messages. It appears that sys-usb failed to start -- or it started, but failed to acquire the usb bus.

One time, I had no keyboard or mouse. After about 30 seconds of sitting idle, I saw a bunch of notifications as every USB bus and device was recognized. But the keyboard and mouse were still dead.

The laptop is a Dell 5550.

Other people have reported the same problem: https://forum.qubes-os.org/t/sys-usb-not-starting-at-boot/5336/3 https://forum.qubes-os.org/t/sys-usb-not-starting-at-boot/5336/7

To me, this looks like a race condition. sys-usb is not starting up with the hardware is ready, or is not accepting hardware if it appears during the startup.

How reproducible

For me, it's happening about 20% of the time.

Steps to reproduce

See bug description.

Expected behavior

sys-usb should start up every time. Keyboard and mouse should always work.

Actual behavior

See bug description.

Solutions you've tried

Tried disposible and non-disposible USB.

R4.0 on the same laptop has no problems. (Never saw this issue.) It's only in R4.1.

Additional context

The most frustrating part is that it is not easily reproducable. It really acts like a startup race condition.


Relevant documentation you've consulted

Discussions on the forums: https://forum.qubes-os.org/t/sys-usb-not-starting-at-boot/5336

Related, non-duplicate issues

The forum has a report about PCI devices not starting up at boot: https://forum.qubes-os.org/t/after-logging-in-complete-black-screen-in-4-1/5210 If it's a race condition with PCI devices, then it could impact mouse, keyboard, and screen.

hackerfactor avatar Aug 05 '21 21:08 hackerfactor

I think I have a way to recreate this. (Worked 3 times in a row.)

During the boot, you prompted to enter the LUKS password. Before entering any password -- just let it sit at the LUKS password prompt for about a minute. (E.g., boot the laptop but then get distracted by a phone call or cute cat video on a different computer.) Then enter the password to unlock the device. The laptop will boot but the mouse will be dead.

If I do it quickly: boot / password, then the mouse works.

hackerfactor avatar Aug 06 '21 16:08 hackerfactor

FWIW, my solution is to use a keybinding to start up sys-usb when that happens.

Long-standing annoyance.

Ark-of-the-copepod avatar Aug 20 '21 03:08 Ark-of-the-copepod

FWIW, my solution is to use a keybinding to start up sys-usb when that happens.

Can you provide more information about where and how to do this? Specifically: Is this added to dom0 or sys-usb? Where is it added (what file or startup script)? And what is the actual command-line?

hackerfactor avatar Aug 24 '21 13:08 hackerfactor

As another data point, I regularly see this error with 4.1 What is interesting is that I can manually start sys-usb once the system is running.

Looking at the logs, I see two entries of interest:

  1. Configuration file @.***/50_autostart.conf is marked world-inaccessible.
  2. A libvirtd error - failed to create new domain 'sys-usb', unable to reset PCI device 0000:00:14.0

The first isn't important. The second is. I'm not clear on why that device needs to be reset, and why that fails on some occasions. Nor why it can be started later without issue.

unman avatar Aug 24 '21 15:08 unman

This is indeed very interesting. Can you check /var/log/libvirt/libxl/libxl-driver.log for more details about this? And maybe /var/log/xen/console/guest-sys-usb-dm.log (or whatever usbvm name you have), but likely there won't be much info there (I'm hoping for qemu error message).

marmarek avatar Aug 24 '21 16:08 marmarek

@hackerfactor : No problem.

Find the 'Keyboard' option in the Q menu (start menu in dom0), choose 'Application Shortcuts'. In there you can set key-bindings to launch applications but it just writes a dom0 commandline input when you press the shortcut so I use it for this.

qvm-start sys-usb qvm-shutdown sys-usb

I also use this to get Qubes to display my screens properly. 50-75% of the time it forgets on boot it has 2 screens, so a shortcut is easier than xrandr --output VGA1 --primary --auto --output LVDS1 --auto --pos 200x1050

Caution- There may be a better way of doing things, my knowledge is low.

edit sorry. I have a non-usb keyboard in the laptop so that works for sys-usb. If not, the other idea I had was to do some kind of startup autoscript in dom0 that ran qvm-shutdown sys-usb then qvm-start sys-usb every time.

Ark-of-the-copepod avatar Aug 25 '21 14:08 Ark-of-the-copepod

On Tue, Aug 24, 2021 at 09:00:37AM -0700, Marek Marczykowski-Górecki wrote:

This is indeed very interesting. Can you check /var/log/libvirt/libxl/libxl-driver.log for more details about this? And maybe /var/log/xen/console/guest-sys-usb-dm.log (or whatever usbvm name you have), but likely there won't be much info there (I'm hoping for qemu error message).

in /var/log/libvirt/libxl/libxl-driver.log i see:

libxl: libxl_pci.c:1493:libxl_device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0
libxl: libxl_pci.c:1772:libxl_device_pci_add_done: Domain 3:libxl_device_pci_add failed for device 0000:00:14.0

I can post full log if needed.

unman avatar Aug 25 '21 14:08 unman

I just had it happen again.

qvm-ls :: sys-usb is in state "Halted". Running "qvm-start sys-usb" wakes up sys-usb, but the mouse still does not work.

I after sys-usb was up (still no mouse), I bounced sys-usb: qvm-shutdown sys-usb ; sleep 30 ; qvm-start sys-usb No change.

Here's the end of /var/log/libvirt/libxl/libxl-driver.log. You can see the same message each time it boots without a working mouse. (You can also see where I tried to manually bounce it at 15:34:37. No success.)

2021-08-18 18:16:35.978+0000: libxl: libxl_pci.c:1493:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0 2021-08-19 19:26:49.761+0000: libxl: libxl_pci.c:1493:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0 2021-08-19 21:38:21.868+0000: libxl: libxl_pci.c:1493:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0 2021-08-20 15:00:39.194+0000: libxl: libxl_pci.c:1493:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0 2021-08-20 17:50:37.595+0000: libxl: libxl_pci.c:1493:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0 2021-08-26 15:34:27.194+0000: libxl: libxl_pci.c:1493:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0 2021-08-26 15:34:37.214+0000: libxl: libxl_pci.c:1772:device_pci_add_done: Domain 3:libxl__device_pci_add failed for PCI device 0:0:14.0 (rc -9) 2021-08-26 15:34:37.214+0000: libxl: libxl_create.c:1904:domcreate_attach_devices: Domain 3:unable to add pci devices 2021-08-26 15:34:37.351+0000: libxl: libxl_pci.c:1493:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0 2021-08-26 15:38:42.039+0000: libxl: libxl_pci.c:1493:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0

hackerfactor avatar Aug 26 '21 15:08 hackerfactor

Note to self: Next time it happens, check qvm-pci.

When it boots and the mouse works, dom0:00_14.0 and dom0:38_00.0 are assigned to sys-usb (no-strict-reset=True).

My suspicion: when it fails, they are not assigned to sys-usb. I might be able to manually assign them to make them work.

qvm-pci a work dom0:00_14.0 --persistent -o no-strict-reset=true
qvm-pci a work dom0:38_00.0 --persistent -o no-strict-reset=true

If this works (lots of 'ifs'), then I can put this in one of the startup scripts so that it is detected and fixed during startup.

hackerfactor avatar Aug 26 '21 16:08 hackerfactor

Found it!

If Qubes boots and the mouse does not run, you can restart the mouse driver on dom0:

  1. Find the mouse driver: lsmod | grep touch Mine says "hid_multitouch 32768 0"

  2. Remove it: sudo rmmod hid_multitouch

  3. Add it back: sudo modprobe hid_multitouch

Since this problem happens way too often for me (1 out of every 4 reboots), I've added this to $HOME/.config/xfce4/xinitrc mouse=$(lsmod | grep touch | awk '{print $1}') sudo rmmod "$mouse" sudo modprobe "$mouse" This way, I bounce it every time the system boots -- whether it needs it or not.

(I'm still doing lots of reboots, but this seems to be working.)

The problem seems to be a race condition. I think module hci_multitouch is starting up before the pci bus is ready. By the time the gui is up, restarting this module is enough to make the mouse work.

hackerfactor avatar Aug 29 '21 19:08 hackerfactor

Interesting, is this driver actually connected to your mouse? I don't think it is a mouse driver... You can check it this way:

ls -l /sys/module/hid_multitouch/drivers/*/

While I don't exclude possibility this driver somehow interfere with your mouse, it should not be necessary for normal operation (even if you have a touchscreen). Can you confirm by not loading it back? If so, maybe simply blacklist it via /etc/modprobe.conf.d?

marmarek avatar Aug 29 '21 22:08 marmarek

@marmarek Without my workaround in xinitrc, the mouse is dead about 1 in 4 reboots. (Mouse cursor is on the screen, but touchpad does nothing.)

If I "sudo rmmod hid_multitouch", then the mouse stays dead. If I then use "sudo modprobe hid_multitouch", then the mouse immediately comes alive.

It's a trackpad built into the laptop. (Dell 5550)

hackerfactor avatar Aug 29 '21 23:08 hackerfactor

For a built-in device, it is unlikely to have anything to do with sys-usb then...

marmarek avatar Aug 29 '21 23:08 marmarek

I'm suspecting that it has more to do with the PCI being ready. Maybe it starts slow or doesn't always respond to the first reset?

That would cause sys-usb to sometimes die -- race condition when the PCI is unavailable when sys-usb starts up. It would also cause multitouch to fail -- race condition when the PCI isn't ready during modprobe but comes up before sys-usb is started.

hackerfactor avatar Aug 30 '21 01:08 hackerfactor

Update: I think I have a fix/patch/workaround. I'm doing this in dom0:$HOME/.config/xfce4/xinitrc

# Bounce mouse to ensure it always works. Qubes race condition, bug https://github.com/QubesOS/qubes-issues/issues/6824
lsmod | grep touch | while read a b ; do
  sudo rmmod "$a"
  sudo modprobe "$a"
done

# Check if sys-usb is running. Qubes race condition, bug https://github.com/QubesOS/qubes-issues/issues/6824
SysUSB=$(qvm-ls --raw-data --fields STATE sys-usb)
if [ "$SysUSB" == "Halted" ] ; then
  qvm-start sys-usb
fi

So far, no more dead mice after multiple reboots.

hackerfactor avatar Aug 30 '21 16:08 hackerfactor

2021-08-26 15:38:42.039+0000: libxl: libxl_pci.c:1493:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0

Got the same issue here in 4.1rc2 incl. a failure notification, even for exactly the same device (Intel USB Hub). It happens even though no-strict-reset=True is set for it. Sometimes the same happens for sys-net Intel Wifi.

I guess no-strict-reset=True is broken in 4.1rc2.

It works fine in 4.0 and IIRC even in 4.1rc1.

3hhh avatar Nov 24 '21 17:11 3hhh

Is this really isolated to 4.1? I just started noticing that my sys-usb is no longer auto-launching in 4.0...

Osndok avatar Feb 14 '22 15:02 Osndok

I also noticed it. It is relay annoying. Sometimes after boot only thing I can do is shutdown since my keyboard (internal USB) and mouse (external USB dongle) not works. Also it appears not only after boot. It also happens sometimes when woken up or returning from suspend...

Szewcson avatar Mar 18 '22 14:03 Szewcson

What is the current status of this issue?

Szewcson avatar Mar 24 '22 09:03 Szewcson

I am not aware of any changes.

DemiMarie avatar Mar 24 '22 14:03 DemiMarie

If this is a separate issue, let me know and i'll create a new issue out of it, but since its sys-usb race condition and might be related, i'll start with it here:

Sometimes sys-usb does start, but shows no usb devices in sys-usb when using "lsusb" When this happens, it seems to be all 5 sys-usb qubes that have the problem (once sys-usb qube per usb controller). They also may be failing to start at all some of the time. (will document this better the next time it happens)

Possibly related: Someone mentioned a timing related correlation with how long you sit at the LUKS password prompt.
I've created this part as a separate issue, https://github.com/QubesOS/qubes-issues/issues/7434 , but basically: I see a LUKS password prompt, but typing shows no feedback acknowledging that you have entered anything (for example showing a dot for each character entered).
After typing in the LUKS password and hitting enter, the screen does not change. Sometimes after a while (30+ seconds) it will disappear and the user login screen will present

ddevz avatar Apr 11 '22 17:04 ddevz

Same for me, from time to time bultin touchpad doesn't work in q 4.1 kernel 5.10.104-3.fc32.qubes.x86_64 built-in keyboard works and I'm able at least make a restart. Also in /var/log/libvirt/libxl/libxl-driver.log I get: ` 2022-05-01 20:00:31.551+0000: libxl: libxl_pci.c:1489:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:0d.0

2022-05-01 20:00:31.555+0000: libxl: libxl_pci.c:1489:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:0d.2

2022-05-01 20:00:31.560+0000: libxl: libxl_pci.c:1489:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0

2022-05-01 20:00:42.692+0000: libxl: libxl_pci.c:1774:device_pci_add_done: Domain 3:libxl__device_pci_add failed for PCI device 0:0:d.2 (rc -9)

2022-05-01 20:00:42.695+0000: libxl: libxl_pci.c:1774:device_pci_add_done: Domain 3:libxl__device_pci_add failed for PCI device 0:0:14.0 (rc -9)

2022-05-01 20:00:42.695+0000: libxl: libxl_create.c:1904:domcreate_attach_devices: Domain 3:unable to add pci devices

2022-05-01 20:00:42.704+0000: libxl: libxl_pci.c:1489:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:0d.0

2022-05-01 20:00:42.855+0000: libxl: libxl_pci.c:1489:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:0d.2

2022-05-01 20:00:42.989+0000: libxl: libxl_pci.c:2019:do_pci_remove: Domain 4:xc_physdev_unmap_pirq irq=16: Invalid argument

2022-05-01 20:00:42.990+0000: libxl: libxl_pci.c:1489:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0

2022-05-01 20:00:43.256+0000: libxl: libxl_exec.c:120:libxl_report_child_exitstatus: killed fork (dying as expected) [6079] unexpectedly exited status zero The kernel doesn't support reset from sysfs for PCI device 0000:00:0d.2 ` About hit_multitouch: removing it during the normal operation (when everything starts properly - most of the time) results in the touchpad strange behaviour (unexpected action such as clicks/drags etc). But I'm able to move cursor.

turmoil995 avatar May 01 '22 20:05 turmoil995

That is not surprising. Use no_strict_reset for the USB controller.

DemiMarie avatar May 02 '22 00:05 DemiMarie

And if you have already no-strict-reset set (see qvm-pci command) - which is done by default for sys-usb, check /var/log/xen/console/guest-sys-usb-dm.log - there may be more details about this failed PCI attach.

marmarek avatar May 02 '22 00:05 marmarek

i faced the similar issues. error in /var/log/libvirt/libxl/libxl-driver.log is same: libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0

running sudo qubesctl state.sls qvm.sys-usb did not fix this, which is surprising if this is supposed to be done by default for sys-usb.

i needed to set no-strict-reset on all PCI devices attached to sys-usb, not just the PCI device that has that error.

mfc avatar May 20 '22 23:05 mfc

i needed to set no-strict-reset on all PCI devices attached to sys-usb, not just the PCI device that has that error.

If you set no-strict-reset on the one with that error, do you then get an error about a different device?

DemiMarie avatar May 21 '22 15:05 DemiMarie

i needed to set no-strict-reset on all PCI devices attached to sys-usb, not just the PCI device that has that error.

If you set no-strict-reset on the one with that error, do you then get an error about a different device?

no same device. i used GUI to select (sys-usb Qube Settings > Devices > Configure strict reset for PCI devices), maybe using command-line works as expected.

mfc avatar May 21 '22 16:05 mfc

@mfc can you make a separate bug with the details?

DemiMarie avatar May 21 '22 18:05 DemiMarie

Confirmation on relationship between bugs plus proposed interm "solution" (at end):

Even though I know if I reboot and type the LUKS password in quicker, I grabbed some data to try to help with the issue

relationship between bugs It looks to me that both 'sys-usb sometimes fails to start' and 'Sometimes sys-usb does start, but shows no usb devices in sys-usb when using "lsusb"' are both caused by a race condition that happens depending on how long one sits at the LUKS prompt

Normally I experience the 'Sometimes sys-usb does start, but shows no usb devices in sys-usb when using "lsusb"' issue when I let it sit at the LUKS prompt for more then somewhere in the 60 second range or so. However this time, I got the 'sys-usb sometimes fails to start' problem. The big difference is that a power failure happened, causing me to come back to a computer that was already at the LUKS prompt.

From dmesg, we can see that the LUKS prompt sat there for approximately 267339 seconds (I.E. 3 days)

[ 11.720123] [drm] Initialized radeon 2.50.0 20080528 for 0000:b3:00.0 on minor 0 [267350.098945] kauditd_printk_skb: 8 callbacks suppressed [267350.098946] audit: type=1130 audit(1654008572.804:25): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-cryptsetup@luks{{some stuff i'm removing it case it's the password in the log}} comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [267350.953886] audit: type=1130 audit(1654008573.659:26): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dracut-initqueue comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [267351.148996] audit: type=1130 audit(1654008573.854:27): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-fsck-root comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [267351.168273] EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: (null)

Hopefully knowing that these are related can help in tracking this down

Proposed interm solution:

I propose that until this gets tracked down, that a note be added to the LUKS password screen that mentions that waiting at that screen can cause some people USB issues. Possibly even waiting and only displaying that after 20 seconds if they haven't gotten through the LUKS prompt already.

(I don't know how difficult making UI changes that early in the boot process would be)

ddevz avatar May 31 '22 16:05 ddevz

That is strange. I have no idea why sitting for a long time at the LUKS prompt could cause USB issues. Does it hit some sort of timeout in the hardware, where the hardware expects the OS to initialize it within a certain amount of time, and fails when this does not happen?

DemiMarie avatar May 31 '22 18:05 DemiMarie