Docker-OSX icon indicating copy to clipboard operation
Docker-OSX copied to clipboard

Boot freezes on EndeavourOS

Open steelbath opened this issue 1 year ago • 16 comments

Under EndeavourOS boot process gets stuck at "matching deferred by IOUSBHostHIDDevice" and the qemu display gets all garbled a minute after freezing. Tried with Ventura, Monterey and pre-installed images.

I already successfully tested and ran the docker-osx on Ubuntu and Arch, so the problem seems to be inside some EndeavourOS config, packages are the same between Arch and EndeavourOS.

Screenshot_20230810_062201 Screenshot_20230810_062045

I really only have space to run one OS and would like to use EndeavourOS and get this to work. Please tell me if i can the get mac boot log out of the VM to share.

System information from template:

1 NAME="EndeavourOS" PRETTY_NAME="EndeavourOS" Filesystem Size Used Avail Use% Mounted on /dev/sda2 119G 70G 43G 62% / QEMU emulator version 8.0.3 Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers libvirtd (libvirt) 9.6.0 total used free shared buff/cache available Mem: 31Gi 7,0Gi 1,4Gi 186Mi 23Gi 24Gi Swap: 0B 0B 0B 12 12 crw-rw-rw- 1 root kvm 10, 232 10 aug 06:51 /dev/kvm total 0 drwxrwxrwt 2 root root 60 10 aug 03:54 . drwxrwxrwt 15 root root 400 10 aug 08:33 .. srwxrwxrwx 1 root root 0 10 aug 03:54 X0 root 45761 0.0 0.2 2132888 92416 ? Ssl 06:55 0:01 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock noose 50504 0.0 0.0 9864 2944 pts/2 S+ 08:47 0:00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn --exclude-dir=.idea --exclude-dir=.tox dockerd kvm:x:992:libvirt-qemu,qemu,noose docker:x:965:noose libvirt:x:963:noose libvirt-qemu:x:960:

steelbath avatar Aug 10 '23 08:08 steelbath

Got the same problem today on ArchLinux with the README's Ventura command : freeze and glitchy after matching deferred by IOUSBHostHIDDevice.

Linux Delta 6.4.9-arch1-1 #1 SMP PREEMPT_DYNAMIC Tue, 08 Aug 2023 22:14:05 +0000 x86_64 GNU/Linux
:0
1
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
Sys. de fichiers Taille Utilisé Dispo Uti% Monté sur
/dev/sdc1          1,9T    1,7T  171G  91% /home
QEMU emulator version 8.0.3
Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers
libvirtd (libvirt) 9.6.0
               total       utilisé      libre     partagé tamp/cache   disponible
Mem:            62Gi        27Gi       9,7Gi       1,8Gi        28Gi        35Gi
Échange:          0B          0B          0B
16
egrep: warning: egrep is obsolescent; using grep -E
16
crw-rw-rw- 1 root kvm 10, 232 11 août  09:32 /dev/kvm
total 0
drwxrwxrwt  2 root root  60 10 août  23:50 .
drwxrwxrwt 25 root root 820 11 août  09:46 ..
srwxrwxrwx  1 root root   0 10 août  23:50 X0
root        1057  0.2  0.1 2429108 101348 ?      Ssl  août10   1:29 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
me       60701  0.0  0.0   6996  2648 pts/11   S+   09:57   0:00 grep --colour=auto --color dockerd
ea3160410470   sickcodes/docker-osx:ventura   "/bin/bash -c 'sudo …"   25 minutes ago   Up 25 minutes   0.0.0.0:50922->10022/tcp, :::50922->10022/tcp   sleepy_hellman
kvm:x:992:me,libvirt-qemu,qemu
libvirt:x:965:me
docker:x:963:me
libvirt-qemu:x:954:

I tried to ssh in just in case, but it does not respond.

andrenasturas avatar Aug 11 '23 08:08 andrenasturas

Same on Fedora 😕

Edit: since I am fortunately using the Silverblue version of Fedora I was able to rollback my OS down to 100days ago, and it is working back, so we probably should point the problem to the qemu team.

This week I will try to track down where it did start to appear.

Also @sickcodes please 🙏 add a HUGE disclaimer in the part of the install process. By default the macOS Base System is the default startup disk and needs to be changed. That broke my current VM the only day (today) I started the VM without watching the boot process.. 😒

RichardFevrier avatar Aug 14 '23 09:08 RichardFevrier

It seems that it is a regression in Linux 6.4.9. Boots successfully on Arch after downgrading to Linux 6.4.8 (I'm using linux-zen).

For Arch users, old linux packages can be found here

YiPrograms avatar Aug 14 '23 11:08 YiPrograms

Could it be related to Intel Downfall mitigation introduced in 6.4.9?

RichardFevrier avatar Aug 14 '23 12:08 RichardFevrier

I'm not sure how the mitigation works and how it could affect KVM's behaviour.

I would try bisecting the commits introduced in 6.4.9 and find out which causes the issue (But it would take some time, mabye later this week.)

YiPrograms avatar Aug 14 '23 18:08 YiPrograms

Also you right about regression in Linux 6.4.9, I jumped back to a Fedora commit from 6 days ago, just before they migrated to 4.6.9 and it is still working.

This is outside of my skills scope, but could it be possible that macOS calls a part that was removed by the mitigation?

RichardFevrier avatar Aug 14 '23 20:08 RichardFevrier

Just realized this has happened to me. This is a nightmare on Arch if you already deleted the pre-upgrade snapshot. Now seeking temporary virtualbox or vmware alternatives which I have used in the past.

TomExMachina avatar Aug 15 '23 05:08 TomExMachina

Just realized this has happened to me. This is a nightmare on Arch if you already deleted the pre-upgrade snapshot. Now seeking temporary virtualbox or vmware alternatives which I have used in the past.

Honestly I really don't care about distributions but as a developer using an immutable one and having the possibility to use it like git repositories (rpm-ostree deploy commit eq to git reset --hard commit) is a huge plus for me.

RichardFevrier avatar Aug 15 '23 07:08 RichardFevrier

So that everyone knows, I moved to the kernel version 6.4.10 with the kernel parameter mitigations=off (⚠️⚠️⚠️ it was just for testing, I rolled back right after, you should not use that on your machine since it implies HUGE security risks ⚠️⚠️⚠️) and I was able to start the macOS Base System, so the mitigation is definetly involved.

If you look at the 6.4.9 changelog you will see that they blocked some AVX + AVX512 instructions waiting for CPU manufacturers to fix the problem in the CPUs microcode.

So the only real solutions I would give is to be patient and wait for your motherboard manufacturer to release the next AGESA (1.2.0.B for AMD) and patch your BIOS with it (should be released during august) which should bypass the kernel mitigation. Otherwise you could rollback to a previous kernel version, or find the exact blocking mitigation (tried gather_data_sampling=off didn't work) and disable it. (I know that AMD is not touched by GDS, but in our case macOS is only using Intel CPUs)

RichardFevrier avatar Aug 15 '23 10:08 RichardFevrier

Just realized this has happened to me. This is a nightmare on Arch if you already deleted the pre-upgrade snapshot. Now seeking temporary virtualbox or vmware alternatives which I have used in the past.

Honestly I really don't care about distributions but as a developer using an immutable one and having the possibility to use it like git repositories (rpm-ostree deploy commit eq to git reset --hard commit) is a huge plus for me.

I've been aware of this for years (Nix I think?) but it always seemed like it would introduce a lot of additional maintenance overhead, though I could be wrong. It was just an intuition.

bitnom avatar Aug 15 '23 17:08 bitnom

I've been aware of this for years (Nix I think?) but it always seemed like it would introduce a lot of additional maintenance overhead, though I could be wrong. It was just an intuition.

With Nix you need to handle a configuration file about your distro. (very powerful but need to dig in the doc first) With Silverblue (and siblings) you need to do nothing, you just have the possibility to use some commands like the one I presented earlier.

RichardFevrier avatar Aug 15 '23 19:08 RichardFevrier

For people using Manjaro (and I guess Arch as well): installing Linux 6.5.0rc5-1 "fixes" this issue (package linux-65)

mukaschultze avatar Aug 18 '23 17:08 mukaschultze

I use EndeavourOS as well and got this problem after creating a VM, and figured it out after a day of troubleshooting that maybe a update messed up and here we are.

I used the downgrade package that comes with EndeavourOS and with this command:

sudo downgrade linux linux-headers

I selected 6.4.8 and worked fine after the reboot.

As always this has a risk of messing something, I did it because I really need to use the VM.

lnormanha avatar Aug 21 '23 22:08 lnormanha

It’s likely that y’all are experiencing the side-effects of the AMD Inception mitigation, which is disabled by spec_rstack_overflow=off. Those weren’t introduced on 6.4.9, but most likely patched by your distributions on that upgrade - hence why looking up the changelog didn’t give you the correct mitigation.

luizribeiro avatar Aug 25 '23 03:08 luizribeiro

It’s likely that y’all are experiencing the side-effects of the AMD Inception mitigation, which is disabled by spec_rstack_overflow=off. Those weren’t introduced on 6.4.9, but most likely patched by your distributions on that upgrade - hence why looking up the changelog didn’t give you the correct mitigation.

Huge thanks for clues. I've also encountered this problem on a zen-3 ryzen cpu. Now just trying to get a BIOS upgrade to see if the mitigation is fixed.

ToolmanP avatar Sep 05 '23 08:09 ToolmanP

This has been resolved in my arch distro after upgrading. Idk which packages were responsible for the fix but I'm on zen kernel.

bitnom avatar Oct 01 '23 20:10 bitnom