qubes-issues
qubes-issues copied to clipboard
sys-usb sometimes fails to start
Qubes OS version
R4.1beta /etc/qubes-release: Qubes release 4.1 (R4.1)
Names and locations of enabled testing repos
Names of affected testing repo packages
Affected component(s) or functionality
sys-usb, mouse
Brief summary
This appears to be a race condition. It does not happen all the time.
- Install Qubes 4.1 (Qubes-R4.1.0-beta1-x86_64.iso).
- During the install, I deselect Whonix (don’t need it for my use model) and I toggle sys-firewall and sys-usb to NOT be disposible. (For my needs, I customize sys-usb.)
- Reboot. sys-usb does not consistently start up.
- Open Qube Manager. Select settings for sys-usb, toggle start on boot (was checked, uncheck it, then recheck it). Click “Apply” and “OK”. From this point on, sys-usb appears to start properly during boot.
The results that I see: Most of the time (about 80%), sys-usb starts properly. About 15% of the time, sys-usb appears to be running, but the mouse (touch pad on laptop) does not respond. Using keystrokes, I can get to the shutdown menu and reboot the system. About 5% of the time, even the keyboard doesn't work. In this case, plugging in USB devices does not even trigger any notification messages. It appears that sys-usb failed to start -- or it started, but failed to acquire the usb bus.
One time, I had no keyboard or mouse. After about 30 seconds of sitting idle, I saw a bunch of notifications as every USB bus and device was recognized. But the keyboard and mouse were still dead.
The laptop is a Dell 5550.
Other people have reported the same problem: https://forum.qubes-os.org/t/sys-usb-not-starting-at-boot/5336/3 https://forum.qubes-os.org/t/sys-usb-not-starting-at-boot/5336/7
To me, this looks like a race condition. sys-usb is not starting up with the hardware is ready, or is not accepting hardware if it appears during the startup.
How reproducible
For me, it's happening about 20% of the time.
Steps to reproduce
See bug description.
Expected behavior
sys-usb should start up every time. Keyboard and mouse should always work.
Actual behavior
See bug description.
Solutions you've tried
Tried disposible and non-disposible USB.
R4.0 on the same laptop has no problems. (Never saw this issue.) It's only in R4.1.
Additional context
The most frustrating part is that it is not easily reproducable. It really acts like a startup race condition.
Relevant documentation you've consulted
Discussions on the forums: https://forum.qubes-os.org/t/sys-usb-not-starting-at-boot/5336
Related, non-duplicate issues
The forum has a report about PCI devices not starting up at boot: https://forum.qubes-os.org/t/after-logging-in-complete-black-screen-in-4-1/5210 If it's a race condition with PCI devices, then it could impact mouse, keyboard, and screen.
I think I have a way to recreate this. (Worked 3 times in a row.)
During the boot, you prompted to enter the LUKS password. Before entering any password -- just let it sit at the LUKS password prompt for about a minute. (E.g., boot the laptop but then get distracted by a phone call or cute cat video on a different computer.) Then enter the password to unlock the device. The laptop will boot but the mouse will be dead.
If I do it quickly: boot / password, then the mouse works.
FWIW, my solution is to use a keybinding to start up sys-usb when that happens.
Long-standing annoyance.
FWIW, my solution is to use a keybinding to start up sys-usb when that happens.
Can you provide more information about where and how to do this? Specifically: Is this added to dom0 or sys-usb? Where is it added (what file or startup script)? And what is the actual command-line?
As another data point, I regularly see this error with 4.1 What is interesting is that I can manually start sys-usb once the system is running.
Looking at the logs, I see two entries of interest:
- Configuration file @.***/50_autostart.conf is marked world-inaccessible.
- A libvirtd error - failed to create new domain 'sys-usb', unable to reset PCI device 0000:00:14.0
The first isn't important. The second is. I'm not clear on why that device needs to be reset, and why that fails on some occasions. Nor why it can be started later without issue.
This is indeed very interesting. Can you check /var/log/libvirt/libxl/libxl-driver.log
for more details about this? And maybe /var/log/xen/console/guest-sys-usb-dm.log
(or whatever usbvm name you have), but likely there won't be much info there (I'm hoping for qemu error message).
@hackerfactor : No problem.
Find the 'Keyboard' option in the Q menu (start menu in dom0), choose 'Application Shortcuts'. In there you can set key-bindings to launch applications but it just writes a dom0 commandline input when you press the shortcut so I use it for this.
qvm-start sys-usb
qvm-shutdown sys-usb
I also use this to get Qubes to display my screens properly. 50-75% of the time it forgets on boot it has 2 screens, so a shortcut is easier than
xrandr --output VGA1 --primary --auto --output LVDS1 --auto --pos 200x1050
Caution- There may be a better way of doing things, my knowledge is low.
edit sorry. I have a non-usb keyboard in the laptop so that works for sys-usb. If not, the other idea I had was to do some kind of startup autoscript in dom0 that ran qvm-shutdown sys-usb
then qvm-start sys-usb
every time.
On Tue, Aug 24, 2021 at 09:00:37AM -0700, Marek Marczykowski-Górecki wrote:
This is indeed very interesting. Can you check
/var/log/libvirt/libxl/libxl-driver.log
for more details about this? And maybe/var/log/xen/console/guest-sys-usb-dm.log
(or whatever usbvm name you have), but likely there won't be much info there (I'm hoping for qemu error message).
in /var/log/libvirt/libxl/libxl-driver.log
i see:
libxl: libxl_pci.c:1493:libxl_device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0
libxl: libxl_pci.c:1772:libxl_device_pci_add_done: Domain 3:libxl_device_pci_add failed for device 0000:00:14.0
I can post full log if needed.
I just had it happen again.
qvm-ls :: sys-usb is in state "Halted". Running "qvm-start sys-usb" wakes up sys-usb, but the mouse still does not work.
I after sys-usb was up (still no mouse), I bounced sys-usb:
qvm-shutdown sys-usb ; sleep 30 ; qvm-start sys-usb
No change.
Here's the end of /var/log/libvirt/libxl/libxl-driver.log. You can see the same message each time it boots without a working mouse. (You can also see where I tried to manually bounce it at 15:34:37. No success.)
2021-08-18 18:16:35.978+0000: libxl: libxl_pci.c:1493:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0 2021-08-19 19:26:49.761+0000: libxl: libxl_pci.c:1493:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0 2021-08-19 21:38:21.868+0000: libxl: libxl_pci.c:1493:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0 2021-08-20 15:00:39.194+0000: libxl: libxl_pci.c:1493:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0 2021-08-20 17:50:37.595+0000: libxl: libxl_pci.c:1493:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0 2021-08-26 15:34:27.194+0000: libxl: libxl_pci.c:1493:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0 2021-08-26 15:34:37.214+0000: libxl: libxl_pci.c:1772:device_pci_add_done: Domain 3:libxl__device_pci_add failed for PCI device 0:0:14.0 (rc -9) 2021-08-26 15:34:37.214+0000: libxl: libxl_create.c:1904:domcreate_attach_devices: Domain 3:unable to add pci devices 2021-08-26 15:34:37.351+0000: libxl: libxl_pci.c:1493:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0 2021-08-26 15:38:42.039+0000: libxl: libxl_pci.c:1493:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0
Note to self: Next time it happens, check qvm-pci.
When it boots and the mouse works, dom0:00_14.0 and dom0:38_00.0 are assigned to sys-usb (no-strict-reset=True).
My suspicion: when it fails, they are not assigned to sys-usb. I might be able to manually assign them to make them work.
qvm-pci a work dom0:00_14.0 --persistent -o no-strict-reset=true
qvm-pci a work dom0:38_00.0 --persistent -o no-strict-reset=true
If this works (lots of 'ifs'), then I can put this in one of the startup scripts so that it is detected and fixed during startup.
Found it!
If Qubes boots and the mouse does not run, you can restart the mouse driver on dom0:
-
Find the mouse driver: lsmod | grep touch Mine says "hid_multitouch 32768 0"
-
Remove it: sudo rmmod hid_multitouch
-
Add it back: sudo modprobe hid_multitouch
Since this problem happens way too often for me (1 out of every 4 reboots), I've added this to $HOME/.config/xfce4/xinitrc mouse=$(lsmod | grep touch | awk '{print $1}') sudo rmmod "$mouse" sudo modprobe "$mouse" This way, I bounce it every time the system boots -- whether it needs it or not.
(I'm still doing lots of reboots, but this seems to be working.)
The problem seems to be a race condition. I think module hci_multitouch is starting up before the pci bus is ready. By the time the gui is up, restarting this module is enough to make the mouse work.
Interesting, is this driver actually connected to your mouse? I don't think it is a mouse driver... You can check it this way:
ls -l /sys/module/hid_multitouch/drivers/*/
While I don't exclude possibility this driver somehow interfere with your mouse, it should not be necessary for normal operation (even if you have a touchscreen). Can you confirm by not loading it back? If so, maybe simply blacklist it via /etc/modprobe.conf.d
?
@marmarek Without my workaround in xinitrc, the mouse is dead about 1 in 4 reboots. (Mouse cursor is on the screen, but touchpad does nothing.)
If I "sudo rmmod hid_multitouch", then the mouse stays dead. If I then use "sudo modprobe hid_multitouch", then the mouse immediately comes alive.
It's a trackpad built into the laptop. (Dell 5550)
For a built-in device, it is unlikely to have anything to do with sys-usb then...
I'm suspecting that it has more to do with the PCI being ready. Maybe it starts slow or doesn't always respond to the first reset?
That would cause sys-usb to sometimes die -- race condition when the PCI is unavailable when sys-usb starts up. It would also cause multitouch to fail -- race condition when the PCI isn't ready during modprobe but comes up before sys-usb is started.
Update: I think I have a fix/patch/workaround. I'm doing this in dom0:$HOME/.config/xfce4/xinitrc
# Bounce mouse to ensure it always works. Qubes race condition, bug https://github.com/QubesOS/qubes-issues/issues/6824
lsmod | grep touch | while read a b ; do
sudo rmmod "$a"
sudo modprobe "$a"
done
# Check if sys-usb is running. Qubes race condition, bug https://github.com/QubesOS/qubes-issues/issues/6824
SysUSB=$(qvm-ls --raw-data --fields STATE sys-usb)
if [ "$SysUSB" == "Halted" ] ; then
qvm-start sys-usb
fi
So far, no more dead mice after multiple reboots.
2021-08-26 15:38:42.039+0000: libxl: libxl_pci.c:1493:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0
Got the same issue here in 4.1rc2 incl. a failure notification, even for exactly the same device (Intel USB Hub). It happens even though no-strict-reset=True is set for it. Sometimes the same happens for sys-net Intel Wifi.
I guess no-strict-reset=True is broken in 4.1rc2.
It works fine in 4.0 and IIRC even in 4.1rc1.
Is this really isolated to 4.1? I just started noticing that my sys-usb is no longer auto-launching in 4.0...
I also noticed it. It is relay annoying. Sometimes after boot only thing I can do is shutdown since my keyboard (internal USB) and mouse (external USB dongle) not works. Also it appears not only after boot. It also happens sometimes when woken up or returning from suspend...
What is the current status of this issue?
I am not aware of any changes.
If this is a separate issue, let me know and i'll create a new issue out of it, but since its sys-usb race condition and might be related, i'll start with it here:
Sometimes sys-usb does start, but shows no usb devices in sys-usb when using "lsusb" When this happens, it seems to be all 5 sys-usb qubes that have the problem (once sys-usb qube per usb controller). They also may be failing to start at all some of the time. (will document this better the next time it happens)
Possibly related:
Someone mentioned a timing related correlation with how long you sit at the LUKS password prompt.
I've created this part as a separate issue, https://github.com/QubesOS/qubes-issues/issues/7434 , but basically:
I see a LUKS password prompt, but typing shows no feedback acknowledging that you have entered anything (for example showing a dot for each character entered).
After typing in the LUKS password and hitting enter, the screen does not change. Sometimes after a while (30+ seconds) it will disappear and the user login screen will present
Same for me, from time to time bultin touchpad doesn't work in q 4.1 kernel 5.10.104-3.fc32.qubes.x86_64 built-in keyboard works and I'm able at least make a restart. Also in /var/log/libvirt/libxl/libxl-driver.log I get: ` 2022-05-01 20:00:31.551+0000: libxl: libxl_pci.c:1489:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:0d.0
2022-05-01 20:00:31.555+0000: libxl: libxl_pci.c:1489:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:0d.2
2022-05-01 20:00:31.560+0000: libxl: libxl_pci.c:1489:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0
2022-05-01 20:00:42.692+0000: libxl: libxl_pci.c:1774:device_pci_add_done: Domain 3:libxl__device_pci_add failed for PCI device 0:0:d.2 (rc -9)
2022-05-01 20:00:42.695+0000: libxl: libxl_pci.c:1774:device_pci_add_done: Domain 3:libxl__device_pci_add failed for PCI device 0:0:14.0 (rc -9)
2022-05-01 20:00:42.695+0000: libxl: libxl_create.c:1904:domcreate_attach_devices: Domain 3:unable to add pci devices
2022-05-01 20:00:42.704+0000: libxl: libxl_pci.c:1489:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:0d.0
2022-05-01 20:00:42.855+0000: libxl: libxl_pci.c:1489:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:0d.2
2022-05-01 20:00:42.989+0000: libxl: libxl_pci.c:2019:do_pci_remove: Domain 4:xc_physdev_unmap_pirq irq=16: Invalid argument
2022-05-01 20:00:42.990+0000: libxl: libxl_pci.c:1489:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0
2022-05-01 20:00:43.256+0000: libxl: libxl_exec.c:120:libxl_report_child_exitstatus: killed fork (dying as expected) [6079] unexpectedly exited status zero The kernel doesn't support reset from sysfs for PCI device 0000:00:0d.2 ` About hit_multitouch: removing it during the normal operation (when everything starts properly - most of the time) results in the touchpad strange behaviour (unexpected action such as clicks/drags etc). But I'm able to move cursor.
That is not surprising. Use no_strict_reset for the USB controller.
And if you have already no-strict-reset set (see qvm-pci
command) - which is done by default for sys-usb, check /var/log/xen/console/guest-sys-usb-dm.log - there may be more details about this failed PCI attach.
i faced the similar issues. error in /var/log/libvirt/libxl/libxl-driver.log
is same:
libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0
running sudo qubesctl state.sls qvm.sys-usb
did not fix this, which is surprising if this is supposed to be done by default for sys-usb.
i needed to set no-strict-reset
on all PCI devices attached to sys-usb
, not just the PCI device that has that error.
i needed to set
no-strict-reset
on all PCI devices attached tosys-usb
, not just the PCI device that has that error.
If you set no-strict-reset
on the one with that error, do you then get an error about a different device?
i needed to set
no-strict-reset
on all PCI devices attached tosys-usb
, not just the PCI device that has that error.If you set
no-strict-reset
on the one with that error, do you then get an error about a different device?
no same device. i used GUI to select (sys-usb
Qube Settings > Devices > Configure strict reset for PCI devices), maybe using command-line works as expected.
@mfc can you make a separate bug with the details?
Confirmation on relationship between bugs plus proposed interm "solution" (at end):
Even though I know if I reboot and type the LUKS password in quicker, I grabbed some data to try to help with the issue
relationship between bugs It looks to me that both 'sys-usb sometimes fails to start' and 'Sometimes sys-usb does start, but shows no usb devices in sys-usb when using "lsusb"' are both caused by a race condition that happens depending on how long one sits at the LUKS prompt
Normally I experience the 'Sometimes sys-usb does start, but shows no usb devices in sys-usb when using "lsusb"' issue when I let it sit at the LUKS prompt for more then somewhere in the 60 second range or so. However this time, I got the 'sys-usb sometimes fails to start' problem. The big difference is that a power failure happened, causing me to come back to a computer that was already at the LUKS prompt.
From dmesg, we can see that the LUKS prompt sat there for approximately 267339 seconds (I.E. 3 days)
[ 11.720123] [drm] Initialized radeon 2.50.0 20080528 for 0000:b3:00.0 on minor 0 [267350.098945] kauditd_printk_skb: 8 callbacks suppressed [267350.098946] audit: type=1130 audit(1654008572.804:25): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-cryptsetup@luks{{some stuff i'm removing it case it's the password in the log}} comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [267350.953886] audit: type=1130 audit(1654008573.659:26): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dracut-initqueue comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [267351.148996] audit: type=1130 audit(1654008573.854:27): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-fsck-root comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [267351.168273] EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: (null)
Hopefully knowing that these are related can help in tracking this down
Proposed interm solution:
I propose that until this gets tracked down, that a note be added to the LUKS password screen that mentions that waiting at that screen can cause some people USB issues. Possibly even waiting and only displaying that after 20 seconds if they haven't gotten through the LUKS prompt already.
(I don't know how difficult making UI changes that early in the boot process would be)
That is strange. I have no idea why sitting for a long time at the LUKS prompt could cause USB issues. Does it hit some sort of timeout in the hardware, where the hardware expects the OS to initialize it within a certain amount of time, and fails when this does not happen?