linux-surface icon indicating copy to clipboard operation
linux-surface copied to clipboard

[SP9] Fedora 42 Hanging on boot with surface_hid

Open AegerUSA opened this issue 3 months ago • 10 comments

After pulling in the latest Fedora updates, the Surface kernel with surface_hid sits at a black screen for >2 minutes before reaching the desktop. Blacklisting the surface_hid driver appears to resolve this issue, dropping startup times to ~20 seconds. Unfortunately this solution breaks keyboard functionality.

Looking at dmesg, I see repeated errors from the surface_hid driver: surface_hid 01:15:02:02:00: unexpected descriptor length: got 0, expected 9 surface_hid 01:15:02:02:00: probe with driver surface_hid failed with error -71

System details: Device: Microsoft Surface 9 Pro Fedora release 42 (Adams) Fedora stock kernel version: kernel-6.16.3-200.fc42.x86_64 Known affected surface kernels:

  • kernel-surface-6.14.2-1.surface.fc42.x86_64
  • kernel-surface-6.15.3-1.surface.fc42.x86_64
  • kernel-surface-6.14.11-1.surface.fc42.x86_64

AegerUSA avatar Sep 02 '25 13:09 AegerUSA

I'm not really sure what's happening here. It looks like surface_hid tries to bind, fails, and then my keyboard/touchpad are falling back to hid-generic: hid-generic 0019:045E:005D.0004: hidraw3: <UNKNOWN> HID v1.11 Device [Microsoft Surface 045E:005D] hid-generic 0019:045E:099C.0006: hidraw4: <UNKNOWN> HID v1.11 Device [Microsoft Surface 045E:099C]

I've tried delaying the loading of surface_hid by manually loading it via a systemd service but it appears to recreate the 2 minute blackscreen after login. I'm guessing this is still blocking udev from loading other devices. Unfortunately I don't have enough experience in this realm to fully understand this issue.

AegerUSA avatar Sep 06 '25 08:09 AegerUSA

It looks like loading the module using a user service (instead of system service) works as a very hacky workaround and gives the system time to load all other drivers beforehand:

Blacklist the surface_hid module:

echo "blacklist surface_hid" | sudo tee /etc/modprobe.d/blacklist-surface-hid.conf
sudo dracut -f

Setup the files:

mkdir -p ~/.config/systemd/user
vim ~/.config/systemd/user/surfacehid.service

Add the content:

[Unit]
Description=Reload surface_hid after login
After=default.target

[Service]
Type=oneshot
ExecStart=/bin/sh -c "/usr/bin/sudo /usr/sbin/modprobe -r surface_hid; sleep 2; /usr/bin/sudo /usr/sbin/modprobe surface_hid"
RemainAfterExit=yes

[Install]
WantedBy=default.target

Enable the service:

systemctl --user daemon-reload
systemctl --user enable surfacehid.service

Setup passwordless sudo access to modprobe:

sudo vim /etc/sudoers.d/surfacehid
<your_username> ALL=(ALL) NOPASSWD: /usr/sbin/modprobe

Reboot

Unfortunately this also requires disabling the passphrase for LUKS encryption as the keyboard is unavailable during boot as well as having to enter in your password at login using the touchscreen.

AegerUSA avatar Sep 06 '25 09:09 AegerUSA

I don't see the same error messages in dmesg, but I see the same overall behavior--60sec delay to login screen and 60sec delay from login to desktop.

I see this on 6.16.9-1.surface.fc42.x86_64 kernel on SP8.

bgoodmansf avatar Oct 11 '25 03:10 bgoodmansf

What specific keyboard model are you using?

Going by the lines below, surface_hid tries to bind to device 01:15:02:02:00 (because it's hard-coded in the SP9 SAM client list), but it doesn't get any HID descriptor as a response.

surface_hid 01:15:02:02:00: unexpected descriptor length: got 0, expected 9
surface_hid 01:15:02:02:00: probe with driver surface_hid failed with error -71

Device 01:15:02:02:00 is the "pen stash", which can be found on the flex keyboard (see e.g. the here) where you can stash your pen. So I'm guessing you're using a keyboard that doesn't have that, and that that thus doesn't report a nice descriptor.

Assuming that that's all correct we should probably update the surface_hid driver to gracefully handle empty responses returning -ENODEV instead of -EPROTO and just gracefully refuse probing via that.

What I don't quite understand though is why this would cause a 2 minute timeout...

qzed avatar Oct 11 '25 03:10 qzed

I'm not sure how to tell which keyboard I have. However, I do not see the surface_hid message described by the OP.

One issue I see which might be related to the delays--using Fedora/KDE when I ask to display the display configuration in system settings, there is a long delay (~45 seconds) where it displays a blank panel. Eventually, it displays the proper information for the display. I wonder if there is some issue happening when the display is being initialized which is causing the delay and is unrelated to this surface_hid issue.

bgoodmansf avatar Oct 11 '25 04:10 bgoodmansf

The flex keyboard has a groove to put the pen above the keys (i.e., between keys and connector/screen). That's the "pen stash". The regular one does not.

qzed avatar Oct 11 '25 05:10 qzed

My keyboard has the hidden pocket for the stylus. But note that unlike the original poster, I do not have any messages in dmesg from surface_hid. I am probably seeing a different problem with similar symptoms.

bgoodmansf avatar Oct 11 '25 22:10 bgoodmansf

Hmm okay, then it is likely different. Can you try to investigate what's blocking things via journalctl or something similar?

qzed avatar Oct 12 '25 14:10 qzed

I boot one machine with two different systems, both are Fedora 42 with KDE desktop. Both use a LUKS encrypted system partition.

One system has the surface kernel 6.16.9-1. The other uses the standard kernel 6.17.4-200 but it has the surface keyboard modules included (surface_aggregator surface_aggregator_registry surface_hid_core surface_hid surface_aggregator_hub).

The standard system takes 12 seconds to reach the login screen from the LUKS password entry. It takes 5 more seconds from the login screen to the desktop display.

The surface system takes 74 seconds to reach the login screen from the LUKS password entry. It takes 67 more seconds from the login screen to the desktop display. So this is 62 seconds additional delay in both cases.

Reviewing the logs for these startups, the main difference I see is some plasma services (plasma-ksplash, plasma-kcminit, xdg-desktop-portal, etc) failing to start. They eventually start again successfully.

These are probably a symptom of the problem rather than the cause. I am not familiar enough with the boot process to do much debugging.

bgoodmansf avatar Oct 30 '25 00:10 bgoodmansf

I just updated to Fedora 43 (kernel 6.17.5-300) and now I see the two long delays with the standard system as well.

Previously, I suspected that the issue might be due to a delay accessing the GPU information. With Fedora 42, there was always a long delay (about 45 sec) when accessing the display configuration panel. The delays during boot seem to be at the times that the GPU handover may be happening.

However, today I see that with Fedora 43 there is no longer a delay to access the display configuration panel.

bgoodmansf avatar Nov 01 '25 20:11 bgoodmansf