rust-hypervisor-firmware layout: Support running the firmware as a BIOS image

Depends on #37 Fixes #5

Now you can do the following:

cargo xbuild --target target.json --release --features=rom
sstrip target/target/release/hypervisor-fw
qemu-system-x86_64 \
    -machine type=q35,accel=kvm \
    -cpu host -smp 1 -m 1024 \
    -display none -serial stdio \
    -bios target/target/release/hypervisor-fw \
    -drive id=boot,file=clear-28660-kvm.img,format=raw,if=none \
    -device virtio-blk-pci,drive=boot,disable-legacy=on

And the firmware will go from the real-mode reset vector all the way to Rust.

Nov 12 '19 11:11 josephlr

And the firmware will go from the real-mode reset vector all the way to Rust. Driver bug is preventing boot due to VIRTIO feature negotiation.

Looking good.

The virtio issue is really that the PCI bar address is zero but that zero bar is still the one that the virtio device is still saying to be used. I think understanding why that PCI bar is zero is the next step forward.

Nov 13 '19 13:11 rbradford

@rbradford @bonzini do the fixes and comments look good? Is there anything else you want me to do here? I could include a high-level doc file explain the multistage boot process (rom16 -> rom32 -> ram32 -> ram64 -> Rust), if you think it's necessary.

Nov 15 '19 11:11 josephlr

@josephlr I think what you've got here is great. I do think it's a good idea to boot through to userspace before merging this as there is obviously something missing. We have a good data point now with the fact that PVH mode with Seabios does show up the PCI BAR address.

Nov 15 '19 11:11 rbradford

@josephlr I think what you've got here is great. I do think it's a good idea to boot through to userspace before merging this as there is obviously something missing. We have a good data point now with the fact that PVH mode with Seabios does show up the PCI BAR address.

Seems reasonable, I'll focus on #26, and try to get that booting to user-space first, and then come back to this (once we know why it's broken).

Nov 15 '19 11:11 josephlr

@josephlr I was able to dump the registers for QEMU and for CH just before jumping to the kernel. They should be in the same state:

https://gist.github.com/rbradford/7ccc15bc2c55d6423840896a42491ac0 vs https://gist.github.com/rbradford/b452d5d5b8e3bddb39040996dede9516

I've not done a deep analysis yet but one obvious thing stands out and that is that the CS and CR0 look different.

Nov 15 '19 12:11 rbradford

I've not done a deep analysis yet but one obvious thing stands out and that is that the CS and CR0 look different.

I think the CS segments being different is OK (as most of the bits are ignored), CR0 difference is also interesting. However, manually setting the CR0 and CS bits to match CH didn't seem to help (kernel stopped in the same place).

Nov 15 '19 12:11 josephlr

In case you want to reproduce:

Modifying the main.rs just after "Jumping to kernel" to write the 0x80 I/O port. CH has a function that prints out debug messages when that is written to so it wasn't hard to print the KVM registers. For QEMU i modified the firmware to boot infinitely after the same message and used "-serial mon:stdio" and used the monitor (ctrl-a c) to get the registers ("info registers")

Nov 15 '19 12:11 rbradford

Clearing CR0.CD and CR0.NW is a good idea anyway for the firmware, but it's only a matter of tidiness (disabling the cache or writeback doesn't work in VMs).

Nov 15 '19 12:11 bonzini

So setting earlyprintk=serial,keep on the command line is great, you get printing from super early in boot, before Linux even decompresses itself.

So the problem is a kernel panic due to it running out of memory. Normal Boot with CH Paniced boot with QEMU

Nov 15 '19 12:11 josephlr

Oh duh, we don't get an e820 map passed in with PVH, so we have to make it ourselves. Right now we're passing in a map with zero memory, and this makes Linux quite unhappy.

Nov 15 '19 13:11 josephlr

@josephlr Use the CMOS to get the limits and make one? Although CH currently has a CMOS implementation it might not be sticking around as nothing is currently using it so best use an E820 where available and fall back to CMOS where it's not.

Nov 15 '19 13:11 rbradford

It looks like pvh gives us an e820 map, so we should be able to just use that.

Nov 15 '19 13:11 josephlr

So it looks like regardless of how we run the firmware (QEMU, PVH, CH, Firecracker, etc..) we eventually need to get two pieces of information from the host:

the address of the RSDP
the e820 table

All the other info we get from the host is optional.

We can get the info in the following way:

Boot Method	RSDP	e820 table
CH/Firecracker	`acpi_rsdp_addr` in `boot_params`	`e820_table` in `boot_params`
PVH	`rsdp_paddr` in `hvm_start_info`	`memmap_paddr` in `hvm_start_info`
QEMU BIOS	`etc/e820` fw_cfg file	tables passed by QEMU

Nov 16 '19 02:11 josephlr

For Q35 rsdp_in_ram is set so I think the RSDP should be findable by scanning the EBDA region for "RSDP" as the spec suggests. For PC however it's only available via QEMU FW_CFG,

As we have a very simple memory model we might be able to get away with hardcoded ranges + the memory output details from the CMOS which might be easier than implementing QEMU FW_CFG.

Nov 16 '19 09:11 rbradford

rust-hypervisor-firmware rust-hypervisor-firmware copied to clipboard

layout: Support running the firmware as a BIOS image

rust-hypervisor-firmware
rust-hypervisor-firmware copied to clipboard