qubes-issues
qubes-issues copied to clipboard
Enable PCIe hotplug in dom0
The problem you're addressing (if any)
PCIe hotplug is currently disabled in dom0. This causes breakage on some laptops and prevents Thunderbolt from being used, even though a Thunderbolt eGPU on recent hardware is the most secure method I know of to get hardware-accelerated graphics in a qube.
Describe the solution you'd like
We should enable PCIe hotplug in dom0.
Where is the value to a user, and who might that user be?
Many users, including our own @fepitre.
Describe alternatives you've considered
None
Additional context
Previously, having PCIe hotplug enabled in the dom0 kernel was considered a security risk, but Xen developers have indicated that it is not.
Relevant documentation you've consulted
Related, non-duplicate issues
#4353, #5522, #5453
Do you have links to Xen discussions re: change in security posture regarding thunderbolt and/or PCIe hotplug?
Personally I want this...but only if reasonably safe.
Historically Xen assigned all new devices to dom0 by default (at least IOMMU-wise). Since XSA-302, it gained a quarantine IOMMU domain support, which should (theoretically) be used instead. This indeed should make it reasonably safe to re-enable PCI hotplug, What remains to be done is:
- verifying if newly plugged in PCI devices indeed are assigned to IOMMU quarantine domain (
xl debug-key Q && xl dmesgshould show that), and - verifying if no dom0 part (especially the kernel and toolstack) tries to move such freshly connected device to dom0; preventing relevant driver to load automatically could be desirable too (attach it to xen-pciback instead)
It sounds like enabling this would still be a non-zero increase in security risk, since the intended safety mechanism is yet another thing that could fail in unexpected ways, so shouldn't this be opt-in rather than enabled by default for everyone?
The decision to disable PCI hotplugging is at: #1673
#3245 is also related, since dom0 kernel is also used in AppVMs by default. From https://github.com/QubesOS/qubes-issues/issues/3245#issuecomment-706798409:
QEMU notifies newly
device_add-ed PCI devices to the VM via ACPI hotplugging mechanism, which is disabled in dom0 kernel. Maybe use in-VM kernel instead, or compile another kernel specifically for VMs -- see #5212.
This issue is a duplicate of an issue I have previously reported, which also happens to be among one of the GitHub issues that disappeared a while ago. π
This issue is a duplicate of an issue I have previously reported, which also happens to be among one of the GitHub issues that disappeared a while ago. π
@marmarek, did GitHub Support ever respond to your request? Is there any way to get back the missing issues?
@marmarek, did GitHub Support ever respond to your request? Is there any way to get back the missing issues?
Sadly, not. I've just pinged them.
On Fri, May 28, 2021 at 11:40:24AM -0700, Marek Marczykowski-G??recki wrote:
@marmarek, did GitHub Support ever respond to your request? Is there any way to get back the missing issues?
Sadly, not. I've just pinged them.
Off topic, I know, but a quick gh comparison suggests there are 55 missing issues.: > 1004 > 1017 > 1025 > 1127 > 1510 > 1678 > 1687 > 1794 > 1893 > 1894 > 1898 > 2187 > 2196 > 2221 > 2481 > 2659 > 2669 > 2804 > 2862 > 2898 > 3119 > 3170 > 3272 > 3358 > 3395 > 3402 > 3414 > 3513 > 4037 > 4056 > 4107 > 4108 > 4240 > 4605 > 4690 > 4923 > 5033 > 5035 > 5063 > 5083 > 5154 > 5204 > 5205 > 5325 > 5334 > 5582 > 5812 > 5924 > 5928 > 5929 > 6344 > 6415 > 6422 > 6451 > 6452
a quick gh comparison suggests there are 55 missing issues
12 of those appear to be fine (#1687 #1893 #1894 #1898 #2187 #2481 #2659 #3170 #3513 #4923 #5035 #5063), ~~maybe your script hit the API rate limit?~~ oh they're pull requests sharing the number namespace with issues, that's why.
So they're still only blocking the original 43 issues, which are absent when paging through https://api.github.com/repos/QubesOS/qubes-issues/issues?state=all&direction=asc&per_page=100&page=<1 to 67 currently>
On Sat, May 29, 2021 at 06:33:05AM -0700, Rusty Bird wrote:
a quick gh comparison suggests there are 55 missing issues
12 of those appear to be fine (#1687 #1893 #1894 #1898 #2187 #2481 #2659 #3170 #3513 #4923 #5035 #5063), maybe your script hit the API rate limit? So they're still only blocking the original 43 issues.
No. I use gh and they dont resolve. (GrapghQL error: Could not resolve to an Issue with the number of ...")
oh they're pull requests sharing the number namespace with issues, that's why.
On Sat, May 29, 2021 at 06:59:34AM -0700, Rusty Bird wrote:
oh they're pull requests sharing the number namespace with issues, that's why.
Good catch
I am using a thunderbolt 4 docking station through which I connect my external displays. These wont be recognized (via xrandr) unless they are cold plugged (at boot). Is this issue (6620) the root cause? And are there possible workarounds known that are maybe specific to external displays connected via TB4?
I am using a thunderbolt 4 docking station through which I connect my external displays. These wont be recognized (via xrandr) unless they are cold plugged (at boot). Is this issue (6620) the root cause? And are there possible workarounds known that are maybe specific to external displays connected via TB4?
Yes, this issue is the root cause. If there is a workaround it would be a bug. For me the workaround was to use a non-thunderbolt old-style Thinkpad Ultra dock, it is working fine with Qubes.
Yes, this issue is the root cause. If there is a workaround it would be a bug. For me the workaround was to use a non-thunderbolt old-style Thinkpad Ultra dock, it is working fine with Qubes.
Thank you. But unfortunately it seems that this is no option for me as I have not found a non-thunderbolt docking station with 130 W power supply over USB-C.
I wonder now if using a custom kernel with CONFIG_HOTPLUG_PCI =y would be acceptable in my case despite the risk described above and in #1673. These risks only apply to my settings to a limited extent: Firewire and expresscard are something I do not need to worry about. And as described in https://www.kernel.org/doc/html/latest/admin-guide/thunderbolt.html security levels can be defined for TB. In my case, I have already TB restricted to only video and usb via the bios. Would using such a custom kernel still be discouraged under these circumstances?
Will this ever be resolved in future updates? I have updated to kernel 6.0.8-1 but still not fixed. Please developers fix this problem
Is this still an issue? I thought it would be and wanted to build a custom ISO, but in the sources i saw it as enabled, so i decided to just try the official ISO, and well thunderbolt hotplug works fine!
I really hope i didn't just ruin my own usecase and it was actually changed on purpose!
i decided to just try the official ISO, and well thunderbolt hotplug works fine!
what build was that? just tested 4.2.0-rc3 and CONFIG_HOTPLUG_PCI is not enabled in the dom0 kernel.
Some other users report thunderbolt to be working as well.
So I wonder whether a) this issue was silently fixed and if so, how does it work now? b) there may be a security issue?
@3hhh it works if you plug in the device (e.g. a dock) before boot; that is not hotplug, however. I don't think anyone reported PCI hotplug to work, including in the thread you linked.
On 1/13/24 09:51, UndeadDevel wrote:
@3hhh it works if you plug in the device (e.g. a dock) before boot; that is not hotplug, however. I don't think anyone reported PCI hotplug to work, including in the thread you linked.
Ah, ok I see. Yes, that should always have worked I guess.
Some other users report thunderbolt to be working as well.
So I wonder whether a) this issue was silently fixed and if so, how does it work now? b) there may be a security issue?
Itβs a security issue in either their firmware or how their firmware is configured.
Granted this was a while ago, but in my testing it worked as a hotplug, so plugging in after booting!
Granted this was a while ago, but in my testing it worked as a hotplug, so plugging in after booting!
Interesting!
Granted this was a while ago, but in my testing it worked as a hotplug, so plugging in after booting!
Is there any guide/forum post to replicate? I'd love my eGPU to hotplug
@marmarek: did your testing trust the log output from Xen or dom0, or did it actually try to perform a PCI DMA transaction and see if the operation succeeded?
Chiming in as another user who desperately needs this feature. I'm a software engineer and AI has become an important part of the skillset. Rather than ship all my keystrokes off to OpenAI/Microsoft, I'd like to be able to run a LLM locally. I want to attach a TH3P4 eGPU to my laptop, but something about the boot process makes it always reset. Then the lack of hotplug means that I never actually get to see it.
If there isn't a workaround, I'm probably going to be forced to switch off of Qubes due to the importance of AI-based workflows :cry:
I am also in dire need of thunderbolt/FireWire hotplug
On Sun, Jun 23, 2024, 5:38β―PM duncancmt @.***> wrote:
Chiming in as another user who desperately needs this feature. I'm a software engineer and AI has become an important part of the skillset. Rather than ship all my keystrokes off to OpenAI/Microsoft, I'd like to be able to run a LLM locally. I want to attach a TH3P4 eGPU to my laptop, but something about the boot process makes it always reset. Then the lack of hotplug means that I never actually get to see it.
If there isn't a workaround, I'm probably going to be forced to switch off of Qubes due to the importance of AI-based workflows π’
β Reply to this email directly, view it on GitHub https://github.com/QubesOS/qubes-issues/issues/6620#issuecomment-2185344978, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOJ2AUIQEJUN54ZREHQPNDLZI5E65AVCNFSM45FOPXD2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJYGUZTINBZG44A . You are receiving this because you commented.Message ID: @.***>
There aren't any workarounds in R4.2, but there is a hope for some (even partial) support in R4.3. Partial means it isn't going to be fully security supported, but I hope to get it working at least for trusted devices. I'll update this ticket when I get some new information and/or something in testable state.
There aren't any workarounds in R4.2, but there is a hope for some (even partial) support in R4.3. Partial means it isn't going to be fully security supported, but I hope to get it working at least for trusted devices. I'll update this ticket when I get some new information and/or something in testable state.
Will a fully supported solution need to wait until R4.4?