qubes-issues icon indicating copy to clipboard operation
qubes-issues copied to clipboard

Enable PCIe hotplug in dom0

Open DemiMarie opened this issue 4 years ago β€’ 34 comments

The problem you're addressing (if any)

PCIe hotplug is currently disabled in dom0. This causes breakage on some laptops and prevents Thunderbolt from being used, even though a Thunderbolt eGPU on recent hardware is the most secure method I know of to get hardware-accelerated graphics in a qube.

Describe the solution you'd like

We should enable PCIe hotplug in dom0.

Where is the value to a user, and who might that user be?

Many users, including our own @fepitre.

Describe alternatives you've considered

None

Additional context

Previously, having PCIe hotplug enabled in the dom0 kernel was considered a security risk, but Xen developers have indicated that it is not.

Relevant documentation you've consulted

Related, non-duplicate issues

#4353, #5522, #5453

DemiMarie avatar May 19 '21 20:05 DemiMarie

Do you have links to Xen discussions re: change in security posture regarding thunderbolt and/or PCIe hotplug?

Personally I want this...but only if reasonably safe.

brendanhoar avatar May 19 '21 20:05 brendanhoar

Historically Xen assigned all new devices to dom0 by default (at least IOMMU-wise). Since XSA-302, it gained a quarantine IOMMU domain support, which should (theoretically) be used instead. This indeed should make it reasonably safe to re-enable PCI hotplug, What remains to be done is:

  • verifying if newly plugged in PCI devices indeed are assigned to IOMMU quarantine domain (xl debug-key Q && xl dmesg should show that), and
  • verifying if no dom0 part (especially the kernel and toolstack) tries to move such freshly connected device to dom0; preventing relevant driver to load automatically could be desirable too (attach it to xen-pciback instead)

marmarek avatar May 19 '21 23:05 marmarek

It sounds like enabling this would still be a non-zero increase in security risk, since the intended safety mechanism is yet another thing that could fail in unexpected ways, so shouldn't this be opt-in rather than enabled by default for everyone?

andrewdavidwong avatar May 20 '21 12:05 andrewdavidwong

The decision to disable PCI hotplugging is at: #1673

#3245 is also related, since dom0 kernel is also used in AppVMs by default. From https://github.com/QubesOS/qubes-issues/issues/3245#issuecomment-706798409:

QEMU notifies newly device_add-ed PCI devices to the VM via ACPI hotplugging mechanism, which is disabled in dom0 kernel. Maybe use in-VM kernel instead, or compile another kernel specifically for VMs -- see #5212.

This issue is a duplicate of an issue I have previously reported, which also happens to be among one of the GitHub issues that disappeared a while ago. πŸ˜“

iamahuman avatar May 28 '21 13:05 iamahuman

This issue is a duplicate of an issue I have previously reported, which also happens to be among one of the GitHub issues that disappeared a while ago. πŸ˜“

@marmarek, did GitHub Support ever respond to your request? Is there any way to get back the missing issues?

andrewdavidwong avatar May 28 '21 15:05 andrewdavidwong

@marmarek, did GitHub Support ever respond to your request? Is there any way to get back the missing issues?

Sadly, not. I've just pinged them.

marmarek avatar May 28 '21 18:05 marmarek

On Fri, May 28, 2021 at 11:40:24AM -0700, Marek Marczykowski-G??recki wrote:

@marmarek, did GitHub Support ever respond to your request? Is there any way to get back the missing issues?

Sadly, not. I've just pinged them.

Off topic, I know, but a quick gh comparison suggests there are 55 missing issues.: > 1004 > 1017 > 1025 > 1127 > 1510 > 1678 > 1687 > 1794 > 1893 > 1894 > 1898 > 2187 > 2196 > 2221 > 2481 > 2659 > 2669 > 2804 > 2862 > 2898 > 3119 > 3170 > 3272 > 3358 > 3395 > 3402 > 3414 > 3513 > 4037 > 4056 > 4107 > 4108 > 4240 > 4605 > 4690 > 4923 > 5033 > 5035 > 5063 > 5083 > 5154 > 5204 > 5205 > 5325 > 5334 > 5582 > 5812 > 5924 > 5928 > 5929 > 6344 > 6415 > 6422 > 6451 > 6452

unman avatar May 29 '21 12:05 unman

a quick gh comparison suggests there are 55 missing issues

12 of those appear to be fine (#1687 #1893 #1894 #1898 #2187 #2481 #2659 #3170 #3513 #4923 #5035 #5063), ~~maybe your script hit the API rate limit?~~ oh they're pull requests sharing the number namespace with issues, that's why.

So they're still only blocking the original 43 issues, which are absent when paging through https://api.github.com/repos/QubesOS/qubes-issues/issues?state=all&direction=asc&per_page=100&page=<1 to 67 currently>

rustybird avatar May 29 '21 13:05 rustybird

On Sat, May 29, 2021 at 06:33:05AM -0700, Rusty Bird wrote:

a quick gh comparison suggests there are 55 missing issues

12 of those appear to be fine (#1687 #1893 #1894 #1898 #2187 #2481 #2659 #3170 #3513 #4923 #5035 #5063), maybe your script hit the API rate limit? So they're still only blocking the original 43 issues.

No. I use gh and they dont resolve. (GrapghQL error: Could not resolve to an Issue with the number of ...")

unman avatar May 29 '21 13:05 unman

oh they're pull requests sharing the number namespace with issues, that's why.

rustybird avatar May 29 '21 13:05 rustybird

On Sat, May 29, 2021 at 06:59:34AM -0700, Rusty Bird wrote:

oh they're pull requests sharing the number namespace with issues, that's why.

Good catch

unman avatar May 29 '21 14:05 unman

I am using a thunderbolt 4 docking station through which I connect my external displays. These wont be recognized (via xrandr) unless they are cold plugged (at boot). Is this issue (6620) the root cause? And are there possible workarounds known that are maybe specific to external displays connected via TB4?

aslfv avatar Sep 02 '22 02:09 aslfv

I am using a thunderbolt 4 docking station through which I connect my external displays. These wont be recognized (via xrandr) unless they are cold plugged (at boot). Is this issue (6620) the root cause? And are there possible workarounds known that are maybe specific to external displays connected via TB4?

Yes, this issue is the root cause. If there is a workaround it would be a bug. For me the workaround was to use a non-thunderbolt old-style Thinkpad Ultra dock, it is working fine with Qubes.

qtpies avatar Sep 02 '22 14:09 qtpies

Yes, this issue is the root cause. If there is a workaround it would be a bug. For me the workaround was to use a non-thunderbolt old-style Thinkpad Ultra dock, it is working fine with Qubes.

Thank you. But unfortunately it seems that this is no option for me as I have not found a non-thunderbolt docking station with 130 W power supply over USB-C.

aslfv avatar Sep 11 '22 21:09 aslfv

I wonder now if using a custom kernel with CONFIG_HOTPLUG_PCI =y would be acceptable in my case despite the risk described above and in #1673. These risks only apply to my settings to a limited extent: Firewire and expresscard are something I do not need to worry about. And as described in https://www.kernel.org/doc/html/latest/admin-guide/thunderbolt.html security levels can be defined for TB. In my case, I have already TB restricted to only video and usb via the bios. Would using such a custom kernel still be discouraged under these circumstances?

aslfv avatar Sep 11 '22 21:09 aslfv

Will this ever be resolved in future updates? I have updated to kernel 6.0.8-1 but still not fixed. Please developers fix this problem

Pesicp avatar Nov 21 '22 16:11 Pesicp

Is this still an issue? I thought it would be and wanted to build a custom ISO, but in the sources i saw it as enabled, so i decided to just try the official ISO, and well thunderbolt hotplug works fine!

I really hope i didn't just ruin my own usecase and it was actually changed on purpose!

Foosec avatar Jun 08 '23 17:06 Foosec

i decided to just try the official ISO, and well thunderbolt hotplug works fine!

what build was that? just tested 4.2.0-rc3 and CONFIG_HOTPLUG_PCI is not enabled in the dom0 kernel.

HRio avatar Sep 07 '23 07:09 HRio

Some other users report thunderbolt to be working as well.

So I wonder whether a) this issue was silently fixed and if so, how does it work now? b) there may be a security issue?

3hhh avatar Jan 13 '24 07:01 3hhh

@3hhh it works if you plug in the device (e.g. a dock) before boot; that is not hotplug, however. I don't think anyone reported PCI hotplug to work, including in the thread you linked.

UndeadDevel avatar Jan 13 '24 08:01 UndeadDevel

On 1/13/24 09:51, UndeadDevel wrote:

@3hhh it works if you plug in the device (e.g. a dock) before boot; that is not hotplug, however. I don't think anyone reported PCI hotplug to work, including in the thread you linked.

Ah, ok I see. Yes, that should always have worked I guess.

3hhh avatar Jan 13 '24 11:01 3hhh

Some other users report thunderbolt to be working as well.

So I wonder whether a) this issue was silently fixed and if so, how does it work now? b) there may be a security issue?

It’s a security issue in either their firmware or how their firmware is configured.

DemiMarie avatar Jan 13 '24 16:01 DemiMarie

Granted this was a while ago, but in my testing it worked as a hotplug, so plugging in after booting!

Foosec avatar Jan 15 '24 20:01 Foosec

Granted this was a while ago, but in my testing it worked as a hotplug, so plugging in after booting!

Interesting!

DemiMarie avatar Jan 15 '24 21:01 DemiMarie

Granted this was a while ago, but in my testing it worked as a hotplug, so plugging in after booting!

Is there any guide/forum post to replicate? I'd love my eGPU to hotplug

SeqBra avatar Jun 13 '24 23:06 SeqBra

@marmarek: did your testing trust the log output from Xen or dom0, or did it actually try to perform a PCI DMA transaction and see if the operation succeeded?

DemiMarie avatar Jun 13 '24 23:06 DemiMarie

Chiming in as another user who desperately needs this feature. I'm a software engineer and AI has become an important part of the skillset. Rather than ship all my keystrokes off to OpenAI/Microsoft, I'd like to be able to run a LLM locally. I want to attach a TH3P4 eGPU to my laptop, but something about the boot process makes it always reset. Then the lack of hotplug means that I never actually get to see it.

If there isn't a workaround, I'm probably going to be forced to switch off of Qubes due to the importance of AI-based workflows :cry:

duncancmt avatar Jun 23 '24 22:06 duncancmt

I am also in dire need of thunderbolt/FireWire hotplug

On Sun, Jun 23, 2024, 5:38β€―PM duncancmt @.***> wrote:

Chiming in as another user who desperately needs this feature. I'm a software engineer and AI has become an important part of the skillset. Rather than ship all my keystrokes off to OpenAI/Microsoft, I'd like to be able to run a LLM locally. I want to attach a TH3P4 eGPU to my laptop, but something about the boot process makes it always reset. Then the lack of hotplug means that I never actually get to see it.

If there isn't a workaround, I'm probably going to be forced to switch off of Qubes due to the importance of AI-based workflows 😒

β€” Reply to this email directly, view it on GitHub https://github.com/QubesOS/qubes-issues/issues/6620#issuecomment-2185344978, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOJ2AUIQEJUN54ZREHQPNDLZI5E65AVCNFSM45FOPXD2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJYGUZTINBZG44A . You are receiving this because you commented.Message ID: @.***>

SeqBra avatar Jun 23 '24 22:06 SeqBra

There aren't any workarounds in R4.2, but there is a hope for some (even partial) support in R4.3. Partial means it isn't going to be fully security supported, but I hope to get it working at least for trusted devices. I'll update this ticket when I get some new information and/or something in testable state.

marmarek avatar Jun 23 '24 23:06 marmarek

There aren't any workarounds in R4.2, but there is a hope for some (even partial) support in R4.3. Partial means it isn't going to be fully security supported, but I hope to get it working at least for trusted devices. I'll update this ticket when I get some new information and/or something in testable state.

Will a fully supported solution need to wait until R4.4?

DemiMarie avatar Jun 24 '24 00:06 DemiMarie