serenity icon indicating copy to clipboard operation
serenity copied to clipboard

Kernel: x86_64 Fails to boot (physical page alloc issue?)

Open iblowmymind opened this issue 3 years ago β€’ 8 comments

SerenityOS repository commit ID: 7881331 (08/21/2022)

Issue: SerenityOS fails to boot in a fresh x86_64 build, both with 1G and 2G memory under QEMU. It builds and happily boots with the same configuration under i686. Any help would be appreciated!

Boot arguments:

qemu-system-x86_64.exe -name "SerenityOS" -d guest_errors -m 1G -cpu max,vmx=off,-x2apic -accel whpx,kernel-irqchip=off -smp 2 -drive file=serenityos.qcow2,format=qcow2,index=0,media=disk,id=disk -kernel "Prekernel" -initrd "Kernel" -append "hello disable_virtio" -device pci-bridge,chassis_nr=1,id=bridge1 -device i82801b11-bridge,bus=bridge1,id=bridge2 -device i82801b11-bridge,id=bridge3 -device sdhci-pci,bus=bridge2 -device sdhci-pci,bus=bridge3 -device ich9-ahci,bus=bridge3 -device e1000,bus=bridge1 -device e1000,netdev=breh -netdev user,id=breh,hostfwd=tcp:127.0.0.1:8888-10.0.2.15:8888,hostfwd=tcp:127.0.0.1:8823-10.0.2.15:23,hostfwd=tcp:127.0.0.1:8000-10.0.2.15:8000,hostfwd=tcp:127.0.0.1:2222-10.0.2.15:22 -audiodev dsound,id=snd0 -machine pcspk-audiodev=snd0 -device ac97,audiodev=snd0 -device VGA,vgamem_mb=64 -display sdl,gl=off -spice port=5930,agent-mouse=off,disable-ticketing=on -device virtio-serial,max_ports=2 -device virtconsole,chardev=stdout -device isa-debugcon,chardev=stdout -device virtio-rng-pci -chardev stdio,id=stdout,mux=on -chardev qemu-vdagent,clipboard=on,mouse=off,id=vdagent,name=vdagent -device virtserialport,chardev=vdagent,nr=1 -usb

Kernel bootlog:

Windows Hypervisor Platform accelerator is operational
qemu-system-x86_64.exe: warning: nic e1000.0 has no peer
0.000 [Kernel]: Loading kernel symbol table...
0.000 [Kernel]: CPU[0]: Supported features: sse3 pclmulqdq dtes64 monitor ds_cpl smx est tm2 ssse3 cnxt_id sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 movbe popcnt tsc_deadline aes xsave osxsave avx f16c rdrand hypervisor fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 psn clflush ds acpi mmx fxsr sse sse2 ss htt tm ia64 pbe fsgsbase bmi1 avx2 smep bmi2 erms invpcid zero_fcs_fds mpx rdseed adx smap clflushopt ia32_arch_capabilities lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm perfctr_core perfctr_nb dbx perftsc pcx_l2i syscall mp nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow
0.000 [Kernel]: CPU[0]: Physical address bit width: 39
0.000 [Kernel]: CPU[0]: Virtual address bit width: 48
0.000 [#0 Kernel]: Initializing unhandled interrupt handlers
0.000 [Kernel]: CPU[0]: CPUID hypervisor signature '', max leaf 0x40000010
0.000 [Kernel]: Kernel Commandline: Prekernel hello disable_virtio
0.000 [Kernel]: MM: Multiboot mmap: address=0x0000000000000000, length=654336, type=1
0.000 [Kernel]: MM: Got an unaligned physical_region from the bootloader; correcting length 654336 by 3072 bytes
0.000 [Kernel]: MM: Multiboot mmap: address=0x000000000009fc00, length=1024, type=2
0.000 [Kernel]: MM: Multiboot mmap: address=0x00000000000f0000, length=65536, type=2
0.000 [Kernel]: MM: Multiboot mmap: address=0x0000000000100000, length=1072562176, type=1
0.000 [Kernel]: MM: Multiboot mmap: address=0x000000003ffe0000, length=131072, type=2
0.000 [Kernel]: MM: Multiboot mmap: address=0x00000000fffc0000, length=262144, type=2
0.000 [Kernel]: MM: Multiboot mmap: address=0x000000fd00000000, length=12884901888, type=2
0.000 [Kernel]: MM: Contiguous reserved range from P000000000009fc00, length is 394240
0.000 [Kernel]: MM: Contiguous reserved range from P000000003ffe0000, length is 1098438017024
0.000 [Kernel]: MM: Need 525314 bytes for physical page management, but no memory region is large enough!
[Kernel]: ASSERTION FAILED: not reached
[Kernel]: ./Kernel/Memory/MemoryManager.cpp:418 in void Kernel::Memory::MemoryManager::initialize_physical_pages()
[Kernel]: KERNEL PANIC! :^(
[Kernel]: Aborted
[Kernel]: at ./Kernel/Arch/x86/common/CPU.cpp:35 in void abort()
[Kernel]: Kernel + 0x0000000000ce882f  Kernel::__panic(char const*, unsigned int, char const*) +0x16f
[Kernel]: Kernel + 0x0000000001239671  abort.localalias +0x38a
[Kernel]: Kernel + 0x00000000012392e7  abort.localalias +0x0
[Kernel]: Kernel + 0x00000000015c2f6c  Kernel::Memory::MemoryManager::initialize_physical_pages() [clone .localalias] +0x172c
[Kernel]: Kernel + 0x00000000015ce91f  Kernel::Memory::MemoryManager::parse_memory_map() [clone .localalias] +0x58ff
[Kernel]: Kernel + 0x00000000015d0d64  Kernel::Memory::MemoryManager::MemoryManager() [clone .localalias] +0x7a4
[Kernel]: Kernel + 0x00000000015d3527  Kernel::Memory::MemoryManager::initialize(unsigned int) +0x1b7
[Kernel]: Kernel + 0x0000000001632f56  init +0x506
[Kernel]: Kernel + 0xffffffdff13038f7

Build host: Ubuntu/WSL2 running on Windows 10 21H2 64-bit (19044.1889), Hypervisor Platform enabled

uname -a:

Linux BLW-HP 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Build steps:

git clone https://github.com/SerenityOS/serenity.git && cd serenity && Meta/serenity.sh rebuild-toolchain x86_64 && Meta/serenity.sh build x86_64 && Meta/serenity.sh copy-src x86_64

Afterwards, I wanted faster I/O, so I copied the image, Kernel, Prekernel & launch arguments (into a batch file) into a seperate directory on my SSD (and converted the disk to QCOW2 because why not?).

No errors were reported during build.

Additional notes: Do note that I built i686 successfully before, and it worked without any issues, both with 1 & 2 GBs of RAM. This only happened in a clean build of x86_64.

iblowmymind avatar Aug 21 '22 13:08 iblowmymind

To add, here's the log with 4G of RAM allocated:

0.000 [Kernel]: Loading kernel symbol table...
0.000 [Kernel]: CPU[0]: Supported features: sse3 pclmulqdq dtes64 monitor ds_cpl smx est tm2 ssse3 cnxt_id sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 movbe popcnt tsc_deadline aes xsave osxsave avx f16c rdrand hypervisor fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 psn clflush ds acpi mmx fxsr sse sse2 ss htt tm ia64 pbe fsgsbase bmi1 avx2 smep bmi2 erms invpcid zero_fcs_fds mpx rdseed adx smap clflushopt ia32_arch_capabilities lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm perfctr_core perfctr_nb dbx perftsc pcx_l2i syscall mp nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow
0.000 [Kernel]: CPU[0]: Physical address bit width: 39
0.000 [Kernel]: CPU[0]: Virtual address bit width: 48
0.000 [#0 Kernel]: Initializing unhandled interrupt handlers
0.000 [Kernel]: CPU[0]: CPUID hypervisor signature '', max leaf 0x40000010
0.000 [Kernel]: Kernel Commandline: Prekernel hello disable_virtio
0.000 [Kernel]: MM: Multiboot mmap: address=0x0000000000000000, length=654336, type=1
0.000 [Kernel]: MM: Got an unaligned physical_region from the bootloader; correcting length 654336 by 3072 bytes
0.000 [Kernel]: MM: Multiboot mmap: address=0x000000000009fc00, length=1024, type=2
0.000 [Kernel]: MM: Multiboot mmap: address=0x00000000000f0000, length=65536, type=2
0.000 [Kernel]: MM: Multiboot mmap: address=0x0000000000100000, length=3220045824, type=1
0.000 [Kernel]: MM: Multiboot mmap: address=0x00000000bffe0000, length=131072, type=2
0.000 [Kernel]: MM: Multiboot mmap: address=0x00000000fffc0000, length=262144, type=2
0.000 [Kernel]: MM: Multiboot mmap: address=0x0000000100000000, length=1073741824, type=1
0.000 [Kernel]: MM: Multiboot mmap: address=0x000000fd00000000, length=12884901888, type=2
0.000 [Kernel]: MM: Contiguous reserved range from P000000000009fc00, length is 394240
0.000 [Kernel]: MM: Contiguous reserved range from P00000000bffe0000, length is 1073872896
0.000 [Kernel]: MM: Contiguous reserved range from P000000fd00000000, length is 12884901888
0.000 [Kernel]: RegionTree: Failed to allocate anywhere: size=2147487744, alignment=4096
[Kernel]: ASSERTION FAILED: !_temporary_result.is_error()
[Kernel]: ./Kernel/Memory/MemoryManager.cpp:445 in void Kernel::Memory::MemoryManager::initialize_physical_pages()
[Kernel]: KERNEL PANIC! :^(
[Kernel]: Aborted
[Kernel]: at ./Kernel/Arch/x86/common/CPU.cpp:35 in void abort()
[Kernel]: Kernel + 0x0000000000ce882f  Kernel::__panic(char const*, unsigned int, char const*) +0x16f
[Kernel]: Kernel + 0x0000000001239671  abort.localalias +0x38a
[Kernel]: Kernel + 0x00000000012392e7  abort.localalias +0x0
[Kernel]: Kernel + 0x00000000015c4716  Kernel::Memory::MemoryManager::initialize_physical_pages() [clone .localalias] +0x2ed6
[Kernel]: Kernel + 0x00000000015ce91f  Kernel::Memory::MemoryManager::parse_memory_map() [clone .localalias] +0x58ff
[Kernel]: Kernel + 0x00000000015d0d64  Kernel::Memory::MemoryManager::MemoryManager() [clone .localalias] +0x7a4
[Kernel]: Kernel + 0x00000000015d3527  Kernel::Memory::MemoryManager::initialize(unsigned int) +0x1b7
[Kernel]: Kernel + 0x0000000001632f56  init +0x506
[Kernel]: Kernel + 0xffffffdfdc9038f7

iblowmymind avatar Aug 21 '22 13:08 iblowmymind

To confirm, this only happens when you bundle the prekernel, kernel, and rootfs into a qcow2 image?

What's the output of qemu-system-x86_64.exe --version (or -v not sure the right arg for qemu)

ADKaster avatar Aug 21 '22 18:08 ADKaster

Nono, it’s still built the same way as the regular QEMU images are, the kernel and prekernel are still seperate. I just converted the filesystem image to QCOW2 to be able to maintain snapshots in a better way and benefit from other Copy-on-Write features.

Not on my PC right now, however it is the latest w64 build from weilnetz.de (7.1-rc2, internal version reports 7.0.94? i think?)

iblowmymind avatar Aug 21 '22 19:08 iblowmymind

Given some conversations on Discord, it looks like this is a Qemu 7.1 regression. Or more likely, Qemu defined some undefined behavior we were relying on in TCG in 7.1 and we need to fix our page table allocation code :D

ADKaster avatar Sep 03 '22 18:09 ADKaster

The workaround for this is to add the following to the Qemu args:

diff --git a/Meta/run.sh b/Meta/run.sh
index 62080175f9..fa60a3c039 100755
--- a/Meta/run.sh
+++ b/Meta/run.sh
@@ -256,6 +256,7 @@ if [ -z "$SERENITY_MACHINE" ]; then
         SERENITY_MACHINE="-M raspi3b -serial stdio"
     else
         SERENITY_MACHINE="
+        -machine pc-i440fx-7.0
         -m $SERENITY_RAM_SIZE
         -smp $SERENITY_CPUS
         -display $SERENITY_QEMU_DISPLAY_BACKEND

the following Qemu patch seems to have added a 12 GiB block to the multiboot headers that we weren't expecting:

https://gitlab.com/qemu-project/qemu/-/commit/8504f129450b909c88e199ca44facd35d38ba4de

The next commit shows the escape hatch they created for pre-7.1 machine types

https://gitlab.com/qemu-project/qemu/-/commit/b3e6982b4154c1c0ab8b25f2e1ac7838a1809824

Not sure if this is a bug in the kernel/prekernel, or if that region is supposed to show up in our region list.

ADKaster avatar Sep 05 '22 21:09 ADKaster

Woohoo, boots now! Marking the issue as closed, if no further discussion is to be made. Has this patch been included any commits since then? Haven't pulled the repo in a while, just inserted that to my QEMU arguments lol.

iblowmymind avatar Sep 10 '22 12:09 iblowmymind

Woohoo, boots now! Marking the issue as closed, if no further discussion is to be made.

Well, it's still an issue if this is going to stay the same in future QEMU versions. Let's keep it open while we figure out the solution.

sin-ack avatar Sep 16 '22 08:09 sin-ack

On second thought, that would be more appropriate. Keeping open.

iblowmymind avatar Sep 18 '22 13:09 iblowmymind