Null pointer dereference on USB dock removal
Qubes OS release
Qubes release 4.2.1 (R4.2)
Linux dom0 6.9.2-1.qubes.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Sun May 26 05:49:19 GMT 2024 x86_64 x86_64 x86_64 GNU/Linux
Brief summary
Unplugging my HP USB dock crashes the system. Upon inspection over a serial console with a USB debugging cable, I found out that some null pointer dereference occurs right after unplugging the dock. The machine hangs for 5 seconds and reboots.
Steps to reproduce
My machine is a Dell Latitude 5500 P80F001 and as a dock I'm using this dock from HP.
To reproduce, have the dock plugged in at boot, enter luks password, wait for the system to boot into the login screen and unplug the dock.
Opening up sudo dmesg -W in dom0 will give you no results after the crash, so you need an external serial console. Follow the instructions here on how to set this up.
Interestingly, the issue does not occur when booting into Qubes first and then plugging in the dock.
Expected behavior
As described on the related post on the discussion forum, booting straight into dom0 (so without xen) prevents the issue from occurring. This would be the expected behavior.
Actual behavior
Right after unplugging the dock, the system hangs and reboots after 5 seconds.
Output from the serial console (read over a USB debugging cable):
[user@dom0 ~]$ sudo dmesg -W
[ 783.329612] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 783.330130] #PF: supervisor read access in kernel mode
[ 783.330456] #PF: error_code(0x0000) - not-present page
[ 783.330713] PGD 0 P4D 0
[ 783.330844] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 783.331067] CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.9.2-1.qubes.fc37.x86_64 #1
[ 783.331453] Hardware name: Dell Inc. Latitude 5500/0M14W7, BIOS 1.13.0 10/06/2021
[ 783.331831] Workqueue: events ucsi_handle_connector_change [typec_ucsi]
[ 783.332174] RIP: e030:strlen+0x4/0x30
[ 783.332363] Code: f7 75 ec 31 c0 c3 cc cc cc cc 48 89 f8 c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa <80> 3f 00 74 14 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 cc
[ 783.333279] RSP: e02b:ffffc90040077da0 EFLAGS: 00010246
[ 783.333748] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000023c000
[ 783.334111] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 783.334511] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 783.334944] R10: ffff888100931f10 R11: 0000000000000000 R12: 0000000000000000
[ 783.335386] R13: 0000000000000000 R14: ffff8881092cd000 R15: 0000000000000000
[ 783.335807] FS: 0000000000000000(0000) GS:ffff888188200000(0000) knlGS:0000000000000000
[ 783.336272] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 783.336622] CR2: 0000000000000000 CR3: 000000013c5b0000 CR4: 0000000000050660
[ 783.337040] Call Trace:
[ 783.337211] <TASK>
[ 783.337361] ? __die+0x23/0x70
[ 783.337549] ? page_fault_oops+0x95/0x190
Note: this output is before I updated the BIOS, which unfortunately did not resolve the issue.
Interestingly I found out today that the problem does not occur when I boot without the dock connected, but wait until I'm presented with the user login screen (dom0 login, not LUKS decrypt) and then connect the USB dock.
This seems to be a sweet spot, because everything works just right this way.
Connecting after logging in is not feasible, because then the appVMs are influenced by the resolution of the laptop resolution on which they were started. For example, AppVM personal starts when it's just my laptop. I connect the dock, and open some app in Personal. This application can be put to full screen on my 2K external monitor, but things are only clickable in the upper left 1920x1080 corner. Everything larger then that is not responding to the clicks.
Thought this would be relevant to add. If this should be made into a separate bug report, please let me know and I will.
Same thing here with thinkpad t14 and usb-c screen (with built-in usb hub). @bakeromso, FYI, you can fix resolution problem with this guide, so no VM restart is needed.
Good news! This issue was resolved for me by updating to Qubes 4.3-rc3. If you're doing an upgrade, don't forget to make backups, I needed them.
I'm not sure if this means I should close the issue.
Good news! This issue was resolved for me by updating to Qubes 4.3-rc3. If you're doing an upgrade, don't forget to make backups, I needed them.
I'm not sure if this means I should close the issue.
Since this issue affects 4.2, which is a different release from 4.3, it's up to the devs to decide whether they'll fix it for 4.2. In this sort of situation, I usually leave the issue open until a dev makes that decision (or until all affected releases have reached EOL).