[SP9] Fedora 42 Hanging on boot with surface_hid
After pulling in the latest Fedora updates, the Surface kernel with surface_hid sits at a black screen for >2 minutes before reaching the desktop. Blacklisting the surface_hid driver appears to resolve this issue, dropping startup times to ~20 seconds. Unfortunately this solution breaks keyboard functionality.
Looking at dmesg, I see repeated errors from the surface_hid driver: surface_hid 01:15:02:02:00: unexpected descriptor length: got 0, expected 9 surface_hid 01:15:02:02:00: probe with driver surface_hid failed with error -71
System details: Device: Microsoft Surface 9 Pro Fedora release 42 (Adams) Fedora stock kernel version: kernel-6.16.3-200.fc42.x86_64 Known affected surface kernels:
- kernel-surface-6.14.2-1.surface.fc42.x86_64
- kernel-surface-6.15.3-1.surface.fc42.x86_64
- kernel-surface-6.14.11-1.surface.fc42.x86_64
I'm not really sure what's happening here. It looks like surface_hid tries to bind, fails, and then my keyboard/touchpad are falling back to hid-generic: hid-generic 0019:045E:005D.0004: hidraw3: <UNKNOWN> HID v1.11 Device [Microsoft Surface 045E:005D] hid-generic 0019:045E:099C.0006: hidraw4: <UNKNOWN> HID v1.11 Device [Microsoft Surface 045E:099C]
I've tried delaying the loading of surface_hid by manually loading it via a systemd service but it appears to recreate the 2 minute blackscreen after login. I'm guessing this is still blocking udev from loading other devices. Unfortunately I don't have enough experience in this realm to fully understand this issue.
It looks like loading the module using a user service (instead of system service) works as a very hacky workaround and gives the system time to load all other drivers beforehand:
Blacklist the surface_hid module:
echo "blacklist surface_hid" | sudo tee /etc/modprobe.d/blacklist-surface-hid.conf
sudo dracut -f
Setup the files:
mkdir -p ~/.config/systemd/user
vim ~/.config/systemd/user/surfacehid.service
Add the content:
[Unit]
Description=Reload surface_hid after login
After=default.target
[Service]
Type=oneshot
ExecStart=/bin/sh -c "/usr/bin/sudo /usr/sbin/modprobe -r surface_hid; sleep 2; /usr/bin/sudo /usr/sbin/modprobe surface_hid"
RemainAfterExit=yes
[Install]
WantedBy=default.target
Enable the service:
systemctl --user daemon-reload
systemctl --user enable surfacehid.service
Setup passwordless sudo access to modprobe:
sudo vim /etc/sudoers.d/surfacehid
<your_username> ALL=(ALL) NOPASSWD: /usr/sbin/modprobe
Reboot
Unfortunately this also requires disabling the passphrase for LUKS encryption as the keyboard is unavailable during boot as well as having to enter in your password at login using the touchscreen.
I don't see the same error messages in dmesg, but I see the same overall behavior--60sec delay to login screen and 60sec delay from login to desktop.
I see this on 6.16.9-1.surface.fc42.x86_64 kernel on SP8.
What specific keyboard model are you using?
Going by the lines below, surface_hid tries to bind to device 01:15:02:02:00 (because it's hard-coded in the SP9 SAM client list), but it doesn't get any HID descriptor as a response.
surface_hid 01:15:02:02:00: unexpected descriptor length: got 0, expected 9
surface_hid 01:15:02:02:00: probe with driver surface_hid failed with error -71
Device 01:15:02:02:00 is the "pen stash", which can be found on the flex keyboard (see e.g. the here) where you can stash your pen. So I'm guessing you're using a keyboard that doesn't have that, and that that thus doesn't report a nice descriptor.
Assuming that that's all correct we should probably update the surface_hid driver to gracefully handle empty responses returning -ENODEV instead of -EPROTO and just gracefully refuse probing via that.
What I don't quite understand though is why this would cause a 2 minute timeout...
I'm not sure how to tell which keyboard I have. However, I do not see the surface_hid message described by the OP.
One issue I see which might be related to the delays--using Fedora/KDE when I ask to display the display configuration in system settings, there is a long delay (~45 seconds) where it displays a blank panel. Eventually, it displays the proper information for the display. I wonder if there is some issue happening when the display is being initialized which is causing the delay and is unrelated to this surface_hid issue.
The flex keyboard has a groove to put the pen above the keys (i.e., between keys and connector/screen). That's the "pen stash". The regular one does not.
My keyboard has the hidden pocket for the stylus. But note that unlike the original poster, I do not have any messages in dmesg from surface_hid. I am probably seeing a different problem with similar symptoms.
Hmm okay, then it is likely different. Can you try to investigate what's blocking things via journalctl or something similar?
I boot one machine with two different systems, both are Fedora 42 with KDE desktop. Both use a LUKS encrypted system partition.
One system has the surface kernel 6.16.9-1. The other uses the standard kernel 6.17.4-200 but it has the surface keyboard modules included (surface_aggregator surface_aggregator_registry surface_hid_core surface_hid surface_aggregator_hub).
The standard system takes 12 seconds to reach the login screen from the LUKS password entry. It takes 5 more seconds from the login screen to the desktop display.
The surface system takes 74 seconds to reach the login screen from the LUKS password entry. It takes 67 more seconds from the login screen to the desktop display. So this is 62 seconds additional delay in both cases.
Reviewing the logs for these startups, the main difference I see is some plasma services (plasma-ksplash, plasma-kcminit, xdg-desktop-portal, etc) failing to start. They eventually start again successfully.
These are probably a symptom of the problem rather than the cause. I am not familiar enough with the boot process to do much debugging.
I just updated to Fedora 43 (kernel 6.17.5-300) and now I see the two long delays with the standard system as well.
Previously, I suspected that the issue might be due to a delay accessing the GPU information. With Fedora 42, there was always a long delay (about 45 sec) when accessing the display configuration panel. The delays during boot seem to be at the times that the GPU handover may be happening.
However, today I see that with Fedora 43 there is no longer a delay to access the display configuration panel.