qubes-issues
qubes-issues copied to clipboard
Use generic modesetting driver instead i915/i965 as default
Qubes OS version:
R4.1
Affected component(s):
Xorg, fixes graphical artifacts with KDE (and others desktops)
Steps to reproduce the behavior:
From https://groups.google.com/d/msgid/qubes-users/5cc49553-b12c-4e4d-7601-f961330a14e6%40gmail.com
I had some graphical artifacts with current stable kernel but testing latest version they become more problematic (it can not draw the main menu, task bar does not properly refresh when switching virtual desktops and also some app windows stops redrawing after some time).
I noticed that it also fixed some ghost clicks from some windows to another (specially if they are full screen).
Also I have better system tray icons than with the other driver but I use "border1" mode. I tried to test default mode and it was pretty bad although I am not sure if it was worse or better than i915 driver.
General notes:
According to https://www.phoronix.com/scan.php?page=news_item&px=Fedora-Xorg-Intel-DDX-Switch this seems the current default driver in Fedora so probably Qubes dom0 should adopt the same decision.
Related issues:
https://github.com/QubesOS/qubes-issues/issues/3267
This should resolve itself automatically once we update dom0 in R4.1.
On my X1 carbon 3rd generation, running R4.1, I was getting graphical artifacts in gnome-terminal and firefox when scrolling. Switching back to the "intel" xorg driver from the modesetting driver seems to fix this problem. Thank you to drpfef on IRC for suggesting this.
That is, I added the following to /etc/X11/xorg.conf.d/20-intel.conf
Section "Device"
Identifier "Intel Graphics"
Driver "intel"
EndSection
I think that perhaps something like this could be added to https://github.com/Qubes-Community/Contents/blob/master/docs/troubleshooting/intel-igfx-troubleshooting.md for R4.1 users who are having trouble with the new modesetting driver? I didn't submit a pull request for the docs myself because so far this is just my one case.
Just wanted to add that I also see this graphics corruption with the modesetting driver on a T460 (HD Graphics 520), although for some reason it's much rarer.
Thanks @dmoerner! Switching from modesetting
to intel
fixed the problem for me. Without this, I was seeing artifacts on Qubes 4.1 on my T430i (Intel HD 4000):
I also get glitches at the boot password prompt. I tried fiddling with i915.mitigations=off
, but that makes no difference.
As an additional data point - I also observe something very similar (streak-like artifacts on changed/highlighted portions of the screen, only in the GUI part - e.g. boot trace and Ctr+Alt+F2 console work fine) on my Intel ADL-H-based laptop. The default or explicit "modesetting" driver exhibit that and only the "intel" one fixes the problem (none provide acceleration though). That's on kernel-latest
from the stable repo (5.16.13 as of right now), but relatively old Mesa packages that dom0 provides.
As an additional data point - I also observe something very similar (streak-like artifacts on changed/highlighted portions of the screen, only in the GUI part - e.g. boot trace and Ctr+Alt+F2 console work fine) on my Intel ADL-H-based laptop. The default or explicit "modesetting" driver exhibit that and only the "intel" one fixes the problem (none provide acceleration though). That's on
kernel-latest
from the stable repo (5.16.13 as of right now), but relatively old Mesa packages that dom0 provides.
@marmarek is there any chance we can provide a newer Mesa?
@marmarek is there any chance we can provide a newer Mesa?
If proven to solve the issue first, then maybe. But that's quite a few packages to repackage/rebuild, and I'm not going to do it "just in case". sys-gui-gpu
may be helpful with testing various versions. Anyway, since the issue applies to relatively old hardware too (especially - way older than fc32 we have in dom0), it's unlikely the upgrade would help.
Yeah, I'm not sure Mesa is the culprit here, TBH. One thing I forgot to mention is that booting the same dom0 directly (i.e. commenting out Xen and changing module
directives to linux
and initrd
respectively in the Grub menu) yields no artifacts - so it looks like that added variable of Xen changes something, though I don't see anything significant in the log diff.
There's still no HW acceleration when booted directly though - and that's where newer Mesa would probably help, but that's orthogonal to the original issue reported in this thread and I just mentioned that for completeness.
Yeah, I'm not sure Mesa is the culprit here, TBH. One thing I forgot to mention is that booting the same dom0 directly (i.e. commenting out Xen and changing
module
directives tolinux
andinitrd
respectively in the Grub menu) yields no artifacts - so it looks like that added variable of Xen changes something, though I don't see anything significant in the log diff.
That is interesting. I wonder if disabling the IOMMU for the i915 integrated GPU would help. @marmarek is it safe to do this, on the assumption that the iGPU is trusted?
I also get glitches at the boot password prompt. I tried fiddling with
i915.mitigations=off
, but that makes no difference.
I have this on one device too. There, starting just Linux (without Xen) helps(*), but iommu=no-igfx
does not.
(*) there are no glitches, but when the prompt appears, I need to press ESC twice to make it update after key presses - otherwise no got appears when entering the passphrase. Could be totally unrelated issue to the graphics driver.
I also get glitches at the boot password prompt.
In my instance, https://github.com/torvalds/linux/commit/bdd8b6c98239cad fixes the issue. Unfortunately, I've seen regressions elsewhere caused by this commit (https://github.com/QubesOS/qubes-issues/issues/7479).
@alt3r-3go @AlxHnr can you test any of the 5.18.x kernel-latest package? It includes the commit mentioned above, and also a follow up fix for it.
@alt3r-3go @AlxHnr can you test any of the 5.18.x kernel-latest package? It includes the commit mentioned above, and also a follow up fix for it.
Does that follow up fix fix #7479?
Does that follow up fix fix #7479?
Yes. But when both are applied on top of 5.15.52, it brings back glitches on plymouth (on this specific hw). Which is kind of expected as the follow up fix un-does https://github.com/torvalds/linux/commit/bdd8b6c98239cad from i915 driver point of view... There is probably some other relevant commit somewhere there, but I'd like to know how it looks for others.
Maybe there needs to be some hardware-specific quirks?
@alt3r-3go @AlxHnr can you test any of the 5.18.x kernel-latest package? It includes the commit mentioned above, and also a follow up fix for it.
No, I don't want to. I've stopped using graphical plymonth some months ago.
I'm installing those right now. FWIW I've been running 5.18.3 for a while and if that one includes the change in question, it didn't help, the artifacts were still there. I see the latest is 5.18.9 as of now, we'll see.
Tested on 5.18.9 - the artifacts are still there. It's a bit different and seems to be slightly better (faster redraw after the artifacts make it unreadable), but only ever so slightly, as the artifacts are still there and the UI is hardly usable, especially the console with its text, which is getting swallowed by the artifacts + does not display for good several seconds until it redraws.
Can you try nopat
option to the dom0 Linux kernel?
Oh, that does work! Apologies if you wanted me to test it with the option from the start :) There's still no acceleration reported by glxinfo
, FWIW, but the artifacts are gone and for all intents and purposes it looks exactly like before, when that intel
driver was enabled. And yes, I've checked it uses the modesetting driver now. I haven't looked in detail, so it's amusing such a small change in the way they detect options in the kernel triggers such an effect.
And to make it explicit - #7479 does not reproduce for me on this kernel (I've actually never seen that, but I skipped 5.17.x kernels).
Ok, nopat
is doing more or less what the commit mentioned above. So we have (at least) two types of hardware:
- where
nopat
(or equivalent commit) fixes glitches - where
nopat
(or equivalent commit) causes #7479
I don't think nopat
is a real solution, I think it's rather a workaround that disables something that is broken. But at least we confirmed it is the same issue (or at least very closely related) that I can reproduce locally.
@marmarek BTW, just to make sure (and this is probably going to be useful for others facing this) - is the nopat
option good for a daily driver/production machine, or using the intel
XOrg driver is a better choice?
I'm not familiar with that part of the kernel and based on a description it looks like disabling PAT should not impact anything (security or performance being the top priority), but I'm not sure.
Just for the record, this solved the issue on an alder lake notebook: https://github.com/QubesOS/qubes-issues/issues/7507#issuecomment-1153081021
@marmarek, could you please comment on my above question about nopat
vs the intel
driver? I've been running with the former since then, but on current 6.0.2 kernel from the "stable" repo kernel-latest package that option causes a reboot loop for me (roughly - after a message that VT-d is being activated for gfx). And the artifacts are still there if I use the default driver without the option. I therefore wonder if Ishould better go back to intel
driver or try to troubleshoot the boot loop.
@DemiMarie, do you have any ideas w.r.t. the above, by any chance (looks like @marmarek is currently too busy or can't comment)?
nopat
might make things a bit slower (not sure if noticeable in practice). But it should not cause reboot. Can you collect a bit more details about the issue? Maybe add noreboot
option to Xen and see if you can see the actual crash message?
Thanks and yes, let me dig into this.
In the meanwhile I've tried running with the intel
driver and that caused a GUI freeze (no reaction to mouse/kbd, no window/desktop refresh, but nothing in the logs and non-GUI processes seemingly running fine) and I reverted to the previously used 5.18 in dom0 (but 6.0.2 in VMs), due to lack of time for proper debug.
Now that I have your comment, I will check it out (probably will check out the latest 6.0.x in the testing repo before that, unless you explicitly don't recommend that).
Quick note - I've recently opened #7894 but it seems to be a duplicate of this issue.
I've tested the nopat
boot option workaround but still get glitches.
Using xorg's intel
driver as a workaround isn't stable for me, I get a few random hard reboots a day.
Do you see the crashes with the intel driver + kernel 5.10.112?
Do you see the crashes with the intel driver + kernel 5.10.112?
I don't remember testing that specific kernel version - I'm running 5.15.76-1 now. If there's a need to test 5.10.112 I could do so, although as mentioned crashes with the intel driver seem totally random and don't happen often, so there's no guarantee that a day without crash would mean that a particular kernel version works...