kernel icon indicating copy to clipboard operation
kernel copied to clipboard

Hermit gets stuck with virtiofsd + netdev

Open Gelbpunkt opened this issue 9 months ago • 4 comments

When running Hermit with virtiofsd and a networking device, it gets stuck after binding a TCP listener. It can be "un-stuck" by hitting enter to trigger an interrupt. The httpd example e.g. hangs for every single request and I can hit enter during every request to make Hermit progress. Logs indicate that this is because it enters the idle task and never leaves it.

Full logs:
[LOADER][INFO] Loader: [0x100000 - 0x121059]
[LOADER][INFO] Found Multiboot information at 0x9500
[LOADER][WARN] Mapping 1 4KiB pages from 0x9000..0xa000 to 0x9000..0xa000
[LOADER][INFO] Found an ELF module at [0x123000 - 0x5a5420]
[LOADER][INFO] Module length: 0x482420
[LOADER][WARN] Mapping 4KiB pages starting from 0x123000 to frames 0x123000..0x200000
[LOADER][WARN] Mapping 2MiB pages starting from 0x200000 to frames 0x200000..0x600000
[LOADER][INFO] Parsing kernel from ELF at 0x123000..0x5a5420 (len = 0x482420 B / 4727840 B)
[LOADER][INFO] Found Hermit version 0.11.0
[LOADER][WARN] Mapping 3 2MiB pages from 0x800000..0xe00000 to 0x800000..0xe00000
[LOADER][INFO] Loading kernel to 0x800000..0xc12180 (len = 0x412180 B / 4268416 B)
[LOADER][INFO] TLS is at 0xade078..0xade110 (len =  0x98 B / 152 B)
[LOADER][WARN] Mapping 8 4KiB pages from 0xa000..0x12000 to 0xa000..0x12000
[LOADER][INFO] boot_info = BootInfo {
    hardware_info: HardwareInfo {
        phys_addr_range: 0x0..0x0,
        serial_port_base: Some(
            0x3f8,
        ),
        device_tree: Some(
            0x12055d,
        ),
    },
    load_info: LoadInfo {
        kernel_image_addr_range: 0x800000..0xc12180,
        tls_info: Some(
            TlsInfo {
                start: 0xade078,
                filesz: 0x21,
                memsz: 0x98,
                align: 0x8,
            },
        ),
    },
    platform_info: Multiboot {
        command_line: Some(
            "../../kernel/hermit-loader-x86_64 ",
        ),
        multiboot_info_addr: 0x9500,
    },
}
[LOADER][INFO] Entering kernel at 0xa78184, stack at 0xa000, raw_boot_info at 0x11f008
[0][INFO] Welcome to Hermit 0.11.0
[0][INFO] Kernel starts at 0x800000
[0][INFO] FDT:
 / {
    compatible = "hermit,multiboot"
    #address-cells = <0x2>
    #size-cells = <0x2>

    memory@0 {
        device_type = "memory"
        reg = <0x0 0x9fc00>
    };

    memory@100000 {
        device_type = "memory"
        reg = <0x100000 0x3fee0000>
    };

    chosen {
        bootargs = [46, 46, 47, 46, 46, 47, 107, 101, 114, 110, 101, 108, 47, 104, 101, 114, 109, 105, 116, 45, 108, 111, 97, 100, 101, 114, 45, 120, 56, 54, 95, 54, 52, 32, 0]
    };
};

[0][INFO] BSS starts at 0xc11128
[0][INFO] tls_info = Some(
    TlsInfo {
        start: 0xade078,
        filesz: 0x21,
        memsz: 0x98,
        align: 0x8,
    },
)
[0][DEBUG] Setting CR0 = Cr0Flags(PROTECTED_MODE_ENABLE | MONITOR_COPROCESSOR | TASK_SWITCHED | EXTENSION_TYPE | NUMERIC_ERROR | WRITE_PROTECT | PAGING)
[0][DEBUG] Enable FSGSBASE support
[0][DEBUG] Setting CR4 = Cr4Flags(PHYSICAL_ADDRESS_EXTENSION | MACHINE_CHECK_EXCEPTION | PAGE_GLOBAL | OSFXSR | OSXMMEXCPT_ENABLE | FSGSBASE | OSXSAVE)
[0][DEBUG] Setting XCR0 = XCr0Flags(X87 | SSE | AVX)
[0][DEBUG] Making P4 table writable
[0][INFO] Total memory size: 1009 MiB
[0][INFO] Kernel region: 0x800000..0xe00000
[0][INFO] Minimum memory size: 19
[0][INFO] Heap: size 888 MB, start address 0x400000000000
[0][INFO] Heap is located at 0x400000000000..0x400037800000 (0 Bytes unmapped)
[0][INFO] Physical memory free list:
         0x1002000..         0x1200000 (len =           0x1fe000, pages =              510)
        0x38800000..        0x3ffe0000 (len =          0x77e0000, pages =            30688)
[0][INFO] Virtual memory free list:
    0x400037800000..    0x800000000000 (len =     0x3fffc8800000, pages =      17179641856)
[0][INFO] bootargs = ../../kernel/hermit-loader-x86_64
[0][INFO]
[0][INFO] ========================== CPU INFORMATION ===========================
[0][INFO] Model:                   AMD Ryzen 7 6800U with Radeon Graphics
[0][INFO] Frequency:               2693 MHz (from Measurement)
[0][INFO] SpeedStep Technology:    Not Available
[0][INFO] Features:                MMX SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AESNI RDRAND FMA MOVBE MCE FXSR XSAVE RDTSCP CLFLUSH TSC-DEADLINE X2APIC HYPERVISOR AVX2 BMI1 BMI2 PKU FSGSBASE RDSEED
[0][INFO] Physical Address Width:  48 bits
[0][INFO] Linear Address Width:    48 bits
[0][INFO] Supports 1GiB Pages:     Yes
[0][INFO] ======================================================================
[0][INFO]
[0][DEBUG] Cr0 = Cr0Flags(PROTECTED_MODE_ENABLE | MONITOR_COPROCESSOR | TASK_SWITCHED | EXTENSION_TYPE | NUMERIC_ERROR | WRITE_PROTECT | PAGING)
[0][DEBUG] Cr4 = Cr4Flags(PHYSICAL_ADDRESS_EXTENSION | MACHINE_CHECK_EXCEPTION | PAGE_GLOBAL | OSFXSR | OSXMMEXCPT_ENABLE | FSGSBASE | OSXSAVE)
[0][INFO] Hermit booted on 2025-04-11 15:03:50.887795 +00:00:00
[0][DEBUG] Scanning PCI Busses 0 to 31
[0][INFO] Found an ACPI revision 0 table at 0xF52D0 with OEM ID "BOCHS "
[0][DEBUG] Found ACPI table: FACP
[0][DEBUG] Had to unmap page Page[4KiB](0x400037802000) before mapping.
[0][DEBUG] Had to unmap page Page[4KiB](0x400037803000) before mapping.
[0][DEBUG] Found ACPI table: APIC
[0][DEBUG] Had to unmap page Page[4KiB](0x400037804000) before mapping.
[0][DEBUG] Had to unmap page Page[4KiB](0x400037805000) before mapping.
[0][DEBUG] Found ACPI table: HPET
[0][DEBUG] Had to unmap page Page[4KiB](0x400037804000) before mapping.
[0][DEBUG] Had to unmap page Page[4KiB](0x400037805000) before mapping.
[0][DEBUG] Found ACPI table: SRAT
[0][DEBUG] Had to unmap page Page[4KiB](0x400037804000) before mapping.
[0][DEBUG] Had to unmap page Page[4KiB](0x400037805000) before mapping.
[0][DEBUG] Found ACPI table: WAET
[0][DEBUG] Found Processor Local APIC record: { acpi_processor_id: 0, apic_id: 0, flags: 1 }
[0][DEBUG] Found I/O APIC record: { id: 0, reserved: 0, address: 0xFEC00000, global_system_interrupt_base: 0 }
[0][DEBUG] Mapping IOAPIC at 0xfec00000 to virtual address 0x400037800000
[0][DEBUG] Had to unmap page Page[4KiB](0x400037800000) before mapping.
[0][DEBUG] Enable x2APIC support
[0][DEBUG] Register name "TLB flush"  for interrupt 80
[0][DEBUG] Register name "Wakeup"  for interrupt 89
[0][INFO] IOAPIC v17 has 24 entries
[0][DEBUG] Disabling irq 0
[0][DEBUG] Disabling irq 2
[0][DEBUG] Register name "Timer"  for interrupt 91
[0][DEBUG] Register name "COM1"  for interrupt 4
[0][DEBUG] Creating idle task 0
[0][DEBUG] Using boot stack 0xa000
[0][DEBUG] IST1 is located at 0x400000001000
[0][DEBUG] Initializing scheduler for core 0 with idle task 0
[0][DEBUG] SMP boot code is 225 bytes long
[0][DEBUG] Mapping SMP boot code to physical and virtual address 0x8000
[0][DEBUG] Set entry point for application processor to 0xa78184
[0][INFO]
[0][INFO] ===================== MULTIPROCESSOR INFORMATION =====================
[0][INFO] APIC in use:             x2APIC
[0][INFO] Initialized CPUs:        1
[0][INFO] ======================================================================
[0][INFO]
[0][INFO] Compiled with PCI support
[0][INFO] Compiled with ACPI support
[0][INFO] Compiled with FSGSBASE support
[0][INFO] Compiled with SMP support
[0][INFO]
[0][INFO] ======================== PCI BUS INFORMATION =========================
[0][INFO] 00:00 Host bridge [0600]: Intel Corporation 440FX - 82441FX PMC [Natoma] [8086:1237]
[0][INFO] 00:01 ISA bridge [0601]: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] [8086:7000]
[0][INFO] 00:02 VGA compatible controller [0300]: Unknown Vendor Unknown Device [1234:1111], BAR0 Memory32 { address: 0xFD000000, size: 0x1000000, prefetchable: true }, BAR2 Memory32 { address: 0xFEBD0000, size: 0x1000, prefetchable: false }
[0][INFO] 00:03 Ethernet controller [0200]: Red Hat, Inc. Virtio network device [1AF4:1041], IRQ 11, BAR1 Memory32 { address: 0xFEBD1000, size: 0x1000, prefetchable: false }, BAR4 Memory64 { address: 0xFE000000, size: 0x4000, prefetchable: true }
[0][INFO] 00:04 Mass storage controller [0180]: Red Hat, Inc. Virtio file system [1AF4:105A], IRQ 11, BAR1 Memory32 { address: 0xFEBD2000, size: 0x1000, prefetchable: false }, BAR4 Memory64 { address: 0xFE004000, size: 0x4000, prefetchable: true }
[0][INFO] ======================================================================
[0][INFO]
[0][DEBUG] Create stacks at 0x400037804000 with a size of 1120 KB
[0][DEBUG] Had to unmap page Page[4KiB](0x400037805000) before mapping.
[0][DEBUG] Creating new task 1 on core 0
[0][DEBUG] Creating task 1 with priority 2 on core 0
[0][DEBUG] Task is available.
[0][DEBUG] Switching task from 0 to 1 (stack 0x0 => 0x40003781df60)
[0][INFO] Hermit is running on common system!
[0][INFO] Found virtio device with device id 0x1041
[0][DEBUG] Mapping bar 4 at 0xfe000000 with length 0x4000
[0][DEBUG] Mapping bar 4 at 0xfe000000 with length 0x4000
[0][DEBUG] Mapping bar 4 at 0xfe000000 with length 0x4000
[0][DEBUG] Mapping bar 4 at 0xfe000000 with length 0x4000
[0][INFO] Feature set wanted by network driver are in conformance with specification.
[0][INFO] Feature set wanted by network driver are in conformance with specification.
[0][INFO] Driver found a subset of features for virtio device 1041. Features are: F(MAC | MRG_RXBUF | STATUS | MQ | INDIRECT_DESC | VERSION_1 | RING_PACKED)
[0][INFO] Features have been negotiated between virtio network device 1041 and driver.
[0][INFO] Created PackedVq: idx=0, size=256
[0][INFO] Created PackedVq: idx=1, size=256
[0][INFO] Network driver successfully initialized virtqueues.
[0][INFO] Device specific initialization for Virtio network device 1041 finished
[0][DEBUG] ChecksumCapabilities { ipv4: Both, udp: Both, tcp: Both, icmpv4: Both, icmpv6: Both }
[0][INFO] Network device with id 1041, has been initialized by driver!
[0][INFO] Virtio-net link is up after initialization.
[0][INFO] Virtio network driver initialized.
[0][DEBUG] Register name "virtio"  for interrupt 11
[0][INFO] Virtio interrupt handler at line 11
[0][INFO] Found virtio device with device id 0x105a
[0][DEBUG] Mapping bar 4 at 0xfe004000 with length 0x4000
[0][DEBUG] Mapping bar 4 at 0xfe004000 with length 0x4000
[0][DEBUG] Mapping bar 4 at 0xfe004000 with length 0x4000
[0][DEBUG] Mapping bar 4 at 0xfe004000 with length 0x4000
[0][DEBUG] Feature set wanted by filesystem driver are in conformance with specification.
[0][INFO] Features have been negotiated between virtio filesystem device 105a and driver.
[0][INFO] Created SplitVq: idx=0, size=1024
[0][INFO] Created SplitVq: idx=1, size=1024
[0][INFO] Filesystem device with id 105a, has been initialized by driver!
[0][INFO] Virtio filesystem driver initialized.
[0][INFO] Try to initialize network!
[0][INFO] MAC address 52-54-00-12-34-56
[0][INFO] ChecksumCapabilities { ipv4: Both, udp: Both, tcp: Both, icmpv4: Both, icmpv6: Both }
[0][INFO] MTU: 1514 bytes
[0][DEBUG] Create directory /tmp
[0][DEBUG] received interrupt 43
[0][INFO] DHCP lost config!
[0][INFO] DHCP config acquired!
[0][INFO] IP address:      192.168.76.9/24
[0][INFO] Default gateway: 192.168.76.2
[0][INFO] DNS server 0:    192.168.76.3
[0][DEBUG] received interrupt 43
[0][DEBUG] Create directory /proc
[0][DEBUG] Open /proc/version, OpenOption(O_RDWR | O_CREAT), AccessPermission(S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH)
[0][DEBUG] Open file /proc/version with OpenOption(O_RDWR | O_CREAT)
[0][DEBUG] Try to initialize fuse filesystem
[0][DEBUG] received interrupt 43
[0][INFO] Mounting virtio-fs at /tracedir
[0][DEBUG] Mounting /tracedir
[0][INFO] Try to initialize uhyve filesystem
[0][DEBUG] Setting argv as: []
[0][DEBUG] Setting envv as: []
[0][INFO] Jumping into application
[0][DEBUG] Switching FPU owner from task 0 to 1
[0][DEBUG] sys_socket: domain 0, type SockType(SOCK_STREAM), protocol 0
[0][DEBUG] sys_setsockopt: 3, level 4095, optname 4
[0][DEBUG] Create stacks at 0x400037940000 with a size of 1120 KB
[0][DEBUG] Creating new task 2 on core 0
[0][DEBUG] Creating task 2 with priority 2 on core 0
Now listening on port 9975
[0][DEBUG] Blocking task 1
[0][DEBUG] Task is available.
[0][DEBUG] Switching task from 1 to 2 (stack 0x40003781DF60 => 0x400037959f60)
[0][DEBUG] Switching FPU owner from task 1 to 2
[0][DEBUG] Create stacks at 0x400037a5c000 with a size of 1120 KB
[0][DEBUG] Creating new task 3 on core 0
[0][DEBUG] Creating task 3 with priority 2 on core 0
[0][DEBUG] Create stacks at 0x400037b78000 with a size of 1120 KB
[0][DEBUG] Creating new task 4 on core 0
[0][DEBUG] Creating task 4 with priority 2 on core 0
[0][DEBUG] Create stacks at 0x400037c94000 with a size of 1120 KB
[0][DEBUG] Creating new task 5 on core 0
[0][DEBUG] Creating task 5 with priority 2 on core 0
[0][DEBUG] Create stacks at 0x400037db0000 with a size of 1120 KB
[0][DEBUG] Creating new task 6 on core 0
[0][DEBUG] Creating task 6 with priority 2 on core 0
[0][DEBUG] Blocking task 2
[0][DEBUG] Task is available.
[0][DEBUG] Switching task from 2 to 3 (stack 0x400037959F60 => 0x400037a75f60)
[0][DEBUG] Switching FPU owner from task 2 to 3
[0][DEBUG] Blocking task 3
[0][DEBUG] Task is available.
[0][DEBUG] Switching task from 3 to 4 (stack 0x400037A75F60 => 0x400037b91f60)
[0][DEBUG] Switching FPU owner from task 3 to 4
[0][DEBUG] Blocking task 4
[0][DEBUG] Task is available.
[0][DEBUG] Switching task from 4 to 5 (stack 0x400037B91F60 => 0x400037cadf60)
[0][DEBUG] Switching FPU owner from task 4 to 5
[0][DEBUG] Blocking task 5
[0][DEBUG] Task is available.
[0][DEBUG] Switching task from 5 to 6 (stack 0x400037CADF60 => 0x400037dc9f60)
[0][DEBUG] Switching FPU owner from task 5 to 6
[0][DEBUG] Blocking task 6
[0][DEBUG] Only Idle Task is available.
[0][DEBUG] Switching task from 6 to 0 (stack 0x400037DC9F60 => 0x11d98)
# STUCK HERE
[0][DEBUG] Waking up task 2 on core 0
[0][DEBUG] Task is available.
[0][DEBUG] Switching task from 0 to 2 (stack 0x11D98 => 0x400037959770)
[0][DEBUG] Switching FPU owner from task 6 to 2
[0][DEBUG] Waking up task 3 on core 0
[0][DEBUG] Switching task from 2 to 3 (stack 0x400037959770 => 0x400037a75970)
[0][DEBUG] Switching FPU owner from task 2 to 3
[0][DEBUG] Waking up task 1 on core 0
[0][DEBUG] Blocking task 3
[0][DEBUG] Task is available.
[0][DEBUG] Switching task from 3 to 2 (stack 0x400037A75970 => 0x400037948e70)
[0][DEBUG] Blocking task 2
[0][DEBUG] Task is available.
[0][DEBUG] Switching task from 2 to 1 (stack 0x400037948E70 => 0x40003781d970)
Request(GET / from Some(192.168.76.2:47352))
[0][DEBUG] Switching FPU owner from task 3 to 1
[0][DEBUG] Blocking task 1
[0][DEBUG] Waking up task 3 on core 0
[0][DEBUG] Task is available.
[0][DEBUG] Switching task from 1 to 3 (stack 0x40003781D970 => 0x400037a757f0)
[0][DEBUG] Switching FPU owner from task 1 to 3
[0][DEBUG] Blocking task 3
[0][DEBUG] Waking up task 3 on core 0
[0][DEBUG] Task is available.
[0][DEBUG] Blocking task 3
[0][DEBUG] Only Idle Task is available.
[0][DEBUG] Switching task from 3 to 0 (stack 0x400037A757F0 => 0x11d98)

Steps to reproduce:

$ cd examples/httpd
$ cargo build --release --target x86_64-unknown-hermit
$ mkdir tracedir
# On Debian, this is /usr/lib/qemu/virtiofsd
$ sudo /usr/libexec/virtiofsd \
    --socket-path=/tmp/vhostqemu \
    --shared-dir=$(pwd)/tracedir \
    --announce-submounts \
    --sandbox none \
    --seccomp none \
    --inode-file-handles=never &
$ sudo qemu-system-x86_64 \
    -cpu host \
    -enable-kvm \
    -smp 1 \
    -m 1024M \
    -display none \
    -serial stdio \
    -device isa-debug-exit,iobase=0xf4,iosize=0x04 \
    -netdev user,id=u1,hostfwd=tcp::9975-:9975,hostfwd=udp::9975-:9975,net=192.168.76.0/24,dhcpstart=192.168.76.9 \
    -device virtio-net-pci,netdev=u1,disable-legacy=on,packed=on,mq=on \
    -chardev socket,id=char0,path=/tmp/vhostqemu \
    -device vhost-user-fs-pci,queue-size=1024,packed=on,chardev=char0,tag=tracedir \
    -object memory-backend-file,id=mem,size=1G,mem-path=/dev/shm,share=on \
    -numa node,memdev=mem \
    -kernel /path/to/hermit-loader-x86_64 \
    -initrd ../../target/x86_64-unknown-hermit/release/httpd
# In a new terminal
$ curl localhost:9975
# Observe it hangs, hit enter in qemu and watch the request complete
# For extra fun, re-run curl afterwards and press enter again ;)

Gelbpunkt avatar Apr 11 '25 15:04 Gelbpunkt

I can reproduce this with the following simple program with hermit at 4a33adb09:

Rust program and Cargo.toml (click to expand)
use std::time::Duration;

#[cfg(target_os = "hermit")]
use hermit as _;
extern crate rftrace as _;
use rftrace_frontend as rftrace;

fn main() {
	let events = rftrace::init(1000000, true);
	rftrace::enable();

	for _ in 0..10 {
		println!("<test-thingy> waiting 10s");
		std::thread::sleep(Duration::from_secs(10));
	}

	rftrace::dump_full_uftrace(events, "/tracedir", "test-thingy").expect("Saving trace failed");
}
[package]
name = "test_thingy"
edition = "2024"

[dependencies]
rftrace = "0.2"
rftrace-frontend = "0.2"
hermit = { path = "../../hermit", default-features = false, features = [
    "fs",
    "instrument",
    "acpi",
    "pci",
] }
QEMU and virtiofsd invocations (click to expand)
virtiofsd --socket-path=/tmp/hermit-trace-virtiofs-sock --shared-dir ./tracedir --tag tracedir
cargo build -Z build-std=std,core,alloc,panic_abort --target x86_64-unknown-hermit && \
      qemu-system-x86_64 \
          -cpu Skylake-Client \
          -smp 1 \
          -m 1G \
          -device isa-debug-exit,iobase=0xf4,iosize=0x04 \
          -display none -serial stdio \
          -chardev socket,id=char0,path=/tmp/hermit-trace-virtiofs-sock \
          -device vhost-user-fs-pci,queue-size=1024,packed=on,chardev=char0,tag=tracedir \
          -object memory-backend-file,id=mem,size=1G,mem-path=/dev/shm,share=on \
          -numa node,memdev=mem \
          -kernel ../../../loader/target/release/hermit-loader-x86_64 \
          -initrd ../../target/x86_64-unknown-hermit/debug/test_thingy \
          -s -S

Everything works up until the application task is blocked waiting for 10s. After that the schedular switches into the idle task and never returns.

Inspecting this behaviour with a debugger shows that at https://github.com/hermit-os/kernel/blob/4a33adb090034810e0d2d5329aaf3e4a62d87bc9/src/scheduler/mod.rs#L124-L127 the last_stack_pointer has a sensible value whereas the new_stack_pointer has a suspiciously low value (see below image). Screenshot of the debugger variable information pane in Codium

Trying to step into switch_to_fpu_owner does not work (simply continues control flow). Continuing beyond that, hermit is unresponsive on stdin/stdout and seems to make no productive progress.

I initially debugged with QEMU using KVM, but that made breakpoints very unreliable unless forcing hardware assisted breakpoints (especially for the first breakpoint in a debugging session). Debugging with LLDB instead of GDB only works at all if using hardware breakpoints with KVM. Switching to QEMU with TCG and -cpu Skylake-Client, breaking worked reliably.

Truncated hermit log output (click to expand)
---- snip (^more mapping of page ranges above here^) ----
[0][TRACE] Mapping PageRange { start: Page[2MiB](0x400035a00000), end: Page[2MiB](0x400035c00000) } to PhysFrameRange { start: PhysFrame[2MiB](0x37a00000), end: PhysFrame[2MiB](0x37c00000) } with PageTableFlags(PRESENT | WRITABLE | NO_EXECUTE)
[0][INFO] Heap is located at 0x400000000000..0x400035c00000 (0 Bytes unmapped)
[0][INFO] Physical memory free list:
         0x2002000..         0x2200000 (len =           0x1fe000, pages =              510)
        0x37c00000..        0x3ffe0000 (len =          0x83e0000, pages =            33760)
[0][INFO] Virtual memory free list:
    0x400035c00000..    0x800000000000 (len =     0x3fffca400000, pages =      17179649024)
[0][INFO] bootargs = /home/hcsch/projects/uni/hermit/loader/target/release/hermit-loader-x86_64 
[0][INFO] 
[0][INFO] ========================== CPU INFORMATION ===========================
[0][INFO] Model:                   Intel Core Processor (Skylake)
[0][INFO] Frequency:               3061 MHz (from Measurement)
[0][INFO] SpeedStep Technology:    Not Available
[0][INFO] Features:                MMX SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AESNI RDRAND FMA MOVBE MCE FXSR XSAVE RDTSCP CLFLUSH X2APIC HYPERVISOR AVX2 BMI1 BMI2 FSGSBASE RDSEED 
[0][INFO] Physical Address Width:  40 bits
[0][INFO] Linear Address Width:    48 bits
[0][INFO] Supports 1GiB Pages:     No
[0][INFO] ======================================================================
[0][INFO] 
[0][DEBUG] Cr0 = Cr0Flags(PROTECTED_MODE_ENABLE | MONITOR_COPROCESSOR | TASK_SWITCHED | EXTENSION_TYPE | NUMERIC_ERROR | WRITE_PROTECT | PAGING)
[0][DEBUG] Cr4 = Cr4Flags(PHYSICAL_ADDRESS_EXTENSION | MACHINE_CHECK_EXCEPTION | PAGE_GLOBAL | OSFXSR | OSXMMEXCPT_ENABLE | FSGSBASE | OSXSAVE)
[0][INFO] Hermit booted on 2025-06-12 15:54:46.286699 +00:00:00
[0][DEBUG] Scanning PCI Busses 0 to 31
[0][INFO] Found an ACPI revision 0 table at 0xF5290 with OEM ID "BOCHS "
[0][TRACE] Mapping PageRange { start: Page[4KiB](0x400035c00000), end: Page[4KiB](0x400035c02000) } to PhysFrameRange { start: PhysFrame[4KiB](0x3ffe2000), end: PhysFrame[4KiB](0x3ffe4000) } with PageTableFlags(PRESENT | NO_EXECUTE)
[0][TRACE] Mapping PageRange { start: Page[4KiB](0x400035c02000), end: Page[4KiB](0x400035c04000) } to PhysFrameRange { start: PhysFrame[4KiB](0x3ffe2000), end: PhysFrame[4KiB](0x3ffe4000) } with PageTableFlags(PRESENT | NO_EXECUTE)
[0][DEBUG] Found ACPI table: FACP
[0][TRACE] Mapping PageRange { start: Page[4KiB](0x400035c04000), end: Page[4KiB](0x400035c06000) } to PhysFrameRange { start: PhysFrame[4KiB](0x3ffe0000), end: PhysFrame[4KiB](0x3ffe2000) } with PageTableFlags(PRESENT | NO_EXECUTE)
[0][TRACE] Mapping PageRange { start: Page[4KiB](0x400035c04000), end: Page[4KiB](0x400035c07000) } to PhysFrameRange { start: PhysFrame[4KiB](0x3ffe0000), end: PhysFrame[4KiB](0x3ffe3000) } with PageTableFlags(PRESENT | NO_EXECUTE)
[0][DEBUG] Had to unmap page Page[4KiB](0x400035c04000) before mapping.
[0][DEBUG] Had to unmap page Page[4KiB](0x400035c05000) before mapping.
[0][TRACE] Mapping PageRange { start: Page[4KiB](0x400035c02000), end: Page[4KiB](0x400035c04000) } to PhysFrameRange { start: PhysFrame[4KiB](0x3ffe2000), end: PhysFrame[4KiB](0x3ffe4000) } with PageTableFlags(PRESENT | NO_EXECUTE)
[0][DEBUG] Had to unmap page Page[4KiB](0x400035c02000) before mapping.
[0][DEBUG] Had to unmap page Page[4KiB](0x400035c03000) before mapping.
[0][DEBUG] Found ACPI table: APIC
[0][TRACE] Mapping PageRange { start: Page[4KiB](0x400035c04000), end: Page[4KiB](0x400035c06000) } to PhysFrameRange { start: PhysFrame[4KiB](0x3ffe2000), end: PhysFrame[4KiB](0x3ffe4000) } with PageTableFlags(PRESENT | NO_EXECUTE)
[0][DEBUG] Had to unmap page Page[4KiB](0x400035c04000) before mapping.
[0][DEBUG] Had to unmap page Page[4KiB](0x400035c05000) before mapping.
[0][DEBUG] Found ACPI table: HPET
[0][TRACE] Mapping PageRange { start: Page[4KiB](0x400035c04000), end: Page[4KiB](0x400035c06000) } to PhysFrameRange { start: PhysFrame[4KiB](0x3ffe2000), end: PhysFrame[4KiB](0x3ffe4000) } with PageTableFlags(PRESENT | NO_EXECUTE)
[0][DEBUG] Had to unmap page Page[4KiB](0x400035c04000) before mapping.
[0][DEBUG] Had to unmap page Page[4KiB](0x400035c05000) before mapping.
[0][DEBUG] Found ACPI table: SRAT
[0][TRACE] Mapping PageRange { start: Page[4KiB](0x400035c04000), end: Page[4KiB](0x400035c06000) } to PhysFrameRange { start: PhysFrame[4KiB](0x3ffe2000), end: PhysFrame[4KiB](0x3ffe4000) } with PageTableFlags(PRESENT | NO_EXECUTE)
[0][DEBUG] Had to unmap page Page[4KiB](0x400035c04000) before mapping.
[0][DEBUG] Had to unmap page Page[4KiB](0x400035c05000) before mapping.
[0][DEBUG] Found ACPI table: WAET
[0][DEBUG] Found Processor Local APIC record: { acpi_processor_id: 0, apic_id: 0, flags: 1 }
[0][DEBUG] Found I/O APIC record: { id: 0, reserved: 0, address: 0xFEC00000, global_system_interrupt_base: 0 }
[0][DEBUG] Mapping IOAPIC at 0xfec00000 to virtual address 0x400035c00000
[0][TRACE] Mapping PageRange { start: Page[4KiB](0x400035c00000), end: Page[4KiB](0x400035c01000) } to PhysFrameRange { start: PhysFrame[4KiB](0xfec00000), end: PhysFrame[4KiB](0xfec01000) } with PageTableFlags(PRESENT | WRITABLE | NO_CACHE | NO_EXECUTE)
[0][DEBUG] Had to unmap page Page[4KiB](0x400035c00000) before mapping.
[0][DEBUG] Enable x2APIC support
[0][DEBUG] Calibrated APIC Timer with a counter value of 124 for 1 microsecond
[0][INFO] IOAPIC v32 has 24 entries
[0][DEBUG] Disabling irq 0
[0][DEBUG] Disabling irq 2
[0][DEBUG] Register name "Timer"  for interrupt 91
[0][DEBUG] Register name "COM1"  for interrupt 4
[0][DEBUG] Creating idle task 0
[0][DEBUG] Using boot stack 0xa000
[0][DEBUG] IST1 is located at 0x400000001000
[0][DEBUG] Initializing scheduler for core 0 with idle task 0
[0][INFO] 
[0][INFO] ===================== MULTIPROCESSOR INFORMATION =====================
[0][INFO] APIC in use:             x2APIC
[0][INFO] Initialized CPUs:        1
[0][INFO] ======================================================================
[0][INFO] 
[0][INFO] Compiled with PCI support
[0][INFO] Compiled with ACPI support
[0][INFO] 
[0][INFO] ======================== PCI BUS INFORMATION =========================
[0][INFO] 00:00 Unknown Class [0600]: Unknown Vendor Unknown Device [8086:1237]
[0][INFO] 00:01 Unknown Class [0601]: Unknown Vendor Unknown Device [8086:7000]
[0][INFO] 00:02 Unknown Class [0300]: Unknown Vendor Unknown Device [1234:1111], BAR0 Memory32 { address: 0xFD000000, size: 0x1000000, prefetchable: true }, BAR2 Memory32 { address: 0xFEBF0000, size: 0x1000, prefetchable: false }
[0][INFO] 00:03 Unknown Class [0200]: Unknown Vendor Unknown Device [8086:100E], IRQ 11, BAR0 Memory32 { address: 0xFEBC0000, size: 0x20000, prefetchable: false }, BAR1 IO { port: 0xC000 }
[0][INFO] 00:04 Unknown Class [0180]: Unknown Vendor Unknown Device [1AF4:105A], IRQ 11, BAR1 Memory32 { address: 0xFEBF1000, size: 0x1000, prefetchable: false }, BAR4 Memory64 { address: 0xFE000000, size: 0x4000, prefetchable: true }
[0][INFO] ======================================================================
[0][INFO] 
[0][DEBUG] Create stacks at 0x400035c04000 with a size of 1120 KB
[0][TRACE] Mapping PageRange { start: Page[4KiB](0x400035c05000), end: Page[4KiB](0x400035c0d000) } to PhysFrameRange { start: PhysFrame[4KiB](0x2003000), end: PhysFrame[4KiB](0x200b000) } with PageTableFlags(PRESENT | WRITABLE | NO_EXECUTE)
[0][DEBUG] Had to unmap page Page[4KiB](0x400035c05000) before mapping.
[0][DEBUG] Had to unmap page Page[4KiB](0x400035c06000) before mapping.
[0][TRACE] Mapping PageRange { start: Page[4KiB](0x400035c0e000), end: Page[4KiB](0x400035c1e000) } to PhysFrameRange { start: PhysFrame[4KiB](0x200b000), end: PhysFrame[4KiB](0x201b000) } with PageTableFlags(PRESENT | WRITABLE | NO_EXECUTE)
[0][TRACE] Mapping PageRange { start: Page[4KiB](0x400035c1f000), end: Page[4KiB](0x400035d1f000) } to PhysFrameRange { start: PhysFrame[4KiB](0x201b000), end: PhysFrame[4KiB](0x211b000) } with PageTableFlags(PRESENT | WRITABLE | NO_EXECUTE)
[0][DEBUG] Creating new task 1 on core 0
[0][DEBUG] Creating task 1 with priority 2 on core 0
[0][DEBUG] Task is available.
[0][DEBUG] Switching task from 0 to 1 (stack 0x0 => 0x400035c1df60)
[0][INFO] Hermit is running on common system!
[0][INFO] Found virtio device with device id 0x105a
[0][DEBUG] Mapping bar 4 at 0xfe000000 with length 0x4000
[0][TRACE] Mapping PageRange { start: Page[4KiB](0x400035d20000), end: Page[4KiB](0x400035d24000) } to PhysFrameRange { start: PhysFrame[4KiB](0xfe000000), end: PhysFrame[4KiB](0xfe004000) } with PageTableFlags(PRESENT | WRITABLE | NO_CACHE | NO_EXECUTE)
[0][DEBUG] Mapping bar 4 at 0xfe000000 with length 0x4000
[0][TRACE] Mapping PageRange { start: Page[4KiB](0x400035d24000), end: Page[4KiB](0x400035d28000) } to PhysFrameRange { start: PhysFrame[4KiB](0xfe000000), end: PhysFrame[4KiB](0xfe004000) } with PageTableFlags(PRESENT | WRITABLE | NO_CACHE | NO_EXECUTE)
[0][DEBUG] Mapping bar 4 at 0xfe000000 with length 0x4000
[0][TRACE] Mapping PageRange { start: Page[4KiB](0x400035d28000), end: Page[4KiB](0x400035d2c000) } to PhysFrameRange { start: PhysFrame[4KiB](0xfe000000), end: PhysFrame[4KiB](0xfe004000) } with PageTableFlags(PRESENT | WRITABLE | NO_CACHE | NO_EXECUTE)
[0][DEBUG] Mapping bar 4 at 0xfe000000 with length 0x4000
[0][TRACE] Mapping PageRange { start: Page[4KiB](0x400035d2c000), end: Page[4KiB](0x400035d30000) } to PhysFrameRange { start: PhysFrame[4KiB](0xfe000000), end: PhysFrame[4KiB](0xfe004000) } with PageTableFlags(PRESENT | WRITABLE | NO_CACHE | NO_EXECUTE)
[0][DEBUG] Feature set wanted by filesystem driver are in conformance with specification.
[0][INFO] Features have been negotiated between virtio filesystem device 105a and driver.
[0][INFO] Created SplitVq: idx=0, size=1024
[0][INFO] Created SplitVq: idx=1, size=1024
[0][INFO] Filesystem device with id 105a, has been initialized by driver!
[0][INFO] Virtio filesystem driver initialized.
[0][DEBUG] Create directory /tmp
[0][DEBUG] Create directory /proc
[0][DEBUG] Open /proc/version, OpenOption(O_RDWR | O_CREAT), AccessPermission(S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH)
[0][DEBUG] Open file /proc/version with OpenOption(O_RDWR | O_CREAT)
[0][DEBUG] Try to initialize fuse filesystem
[0][DEBUG] received interrupt 43
[0][TRACE] fuse init answer: Rsp { headers: RspHeader { out_header: fuse_out_header { len: 80, error: 0, unique: 1 }, op_header: fuse_init_out { major: 7, minor: 38, max_readahead: 0, flags: 0, max_background: 65535, congestion_threshold: 49149, max_write: 1048576, time_gran: 1, max_pages: 256, map_alignment: 0, flags2: 0, max_stack_depth: 0, unused: [0, 0, 0, 0, 0, 0] } }, payload: None }
[0][INFO] Mounting virtio-fs at /tracedir
[0][DEBUG] Mounting /tracedir
[0][INFO] Try to initialize uhyve filesystem
[0][DEBUG] Setting argv as: []
[0][DEBUG] Setting envv as: []
[0][INFO] Jumping into application
[0][TRACE] __sys_malloc: allocate memory at 0x40000001d580 (size 0x2625a00, align 0x8)
[0][TRACE] __sys_malloc: allocate memory at 0x40000001c700 (size 0x18, align 0x8)
[0][TRACE] __sys_malloc: allocate memory at 0x400002643000 (size 0x400, align 0x1)
<test-thingy> waiting 10s
[0][DEBUG] sys_usleep blocking the task for 10000000 microseconds
[0][DEBUG] Blocking task 1
[0][DEBUG] Only Idle Task is available.
[0][DEBUG] Switching task from 1 to 0 (stack 0x400035C1DF60 => 0x11938)

hcsch avatar Jun 12 '25 16:06 hcsch

So far I haven't been able to capture any traces with rftrace because of this

hcsch avatar Jun 12 '25 16:06 hcsch

It seems that a requirement for the bug is kernel instrumentation with mcount (having the instrument feature enabled) and a large enough value of max_event_count (the parameter to rftrace::init). For me 42811 still works, but 42812 and larger exhibit the above described problems.

cargo xtask ci rs --arch x86_64 --package rftrace-example qemu --accel --virtiofsd with the following patch to rftrace-example from hermit-rs reliably reproduces the issue for me:

diff --git a/examples/rftrace-example/Cargo.toml b/examples/rftrace-example/Cargo.toml
index 28be276..78334c0 100644
--- a/examples/rftrace-example/Cargo.toml
+++ b/examples/rftrace-example/Cargo.toml
@@ -11,4 +11,5 @@ rftrace-frontend = "0.2"
 hermit = { path = "../../hermit" }
 
 [features]
+default = ["instrument-kernel"]
 instrument-kernel = ["hermit/instrument"]
diff --git a/examples/rftrace-example/src/main.rs b/examples/rftrace-example/src/main.rs
index 713f470..f8bdcef 100644
--- a/examples/rftrace-example/src/main.rs
+++ b/examples/rftrace-example/src/main.rs
@@ -1,3 +1,5 @@
+use std::time::Duration;
+
 #[cfg(target_os = "hermit")]
 use hermit as _;
 
@@ -5,9 +7,11 @@ extern crate rftrace as _;
 use rftrace_frontend as rftrace;
 
 fn main() {
-       let events = rftrace::init(2000, false);
+       let events = rftrace::init(42812, false);
        rftrace::enable();
        f1();
+       println!("waiting");
+       std::thread::sleep(Duration::from_secs(1));
        std::hint::black_box(());
        // Uhyve mounts at `/host`, virtiofsd mounts to `/root`
        rftrace::dump_full_uftrace(events, "/root/tracedir", "rftrace-example").unwrap();

hcsch avatar Jun 13 '25 13:06 hcsch

The stack pointer of 71992 seems normal after all (rsp is smaller than 100_000 during pre_init for example).

I juts discovered that during (what I think is) a run of PerCoreScheduler::reschedule, the VM runs into a general protection fault and the general_protection_exception handler is called (see screenshot of call-stack). screenshot of the call stack pane in Codium

Beyond that it seems that in rftrace_mcoun_entry movq -0x50(%rax), %r8 repeatedly triggers a GP fault. And because the handler is also instrumented, this causes an infinite loop: general_protection_exceptionmcountrftrace_mcount_entrygeneral_protection_exception

hcsch avatar Jun 13 '25 14:06 hcsch