dosemu2 icon indicating copy to clipboard operation
dosemu2 copied to clipboard

KVM: Rudimentary support for VCPI

Open bartoldeman opened this issue 3 years ago • 55 comments

Implement basic VCPI support with limitations. The limitations are: $_dpmi = (0) $_ems = (0)

  • don't use FDPP
  • use external XMS provider (HIMEM/FDXMS)

VCPI clients allocate memory from XMS only, so EMS needs to report 0 memory, and XMS needs to map identical "physical" memory starting at 0x110000, which would normally be taken by DPMI memory.

The page table maps LOW+HMA as usual, then a page at 0x110000 to monitor code implementing a monitor->VCPI client jump and a VCPI client -> VM86 jump. A page at 0x111000 contains the saved monitor GDTR/IDTR/CR3 values and a temporary stack, with hiword(esp)=0, for the client to use. The monitor code is much like Jemm's.

When KVM interrupts the VCPI client through a signal, DOSEMU does its regular things, but CANNOT modify any registers, as the client registers stay in the VM. pic_run() will see VIF is not set, so won't modify, and the kvm.c code needs to use KVM_INTERRUPT if pic_pending(). The only other place that has been adapted for a callback into vm86 is leavedos().

VGAEMU memory is unprotected for now, so updates aren't reliable, and VGA planar modes don't work.

DOOM is playable but a bit slow due to frequent VGA I/O port accesses which would normally go via instremu. Duke Nukem 3D runs but with choppy sound and incomplete screen updates.

bartoldeman avatar Jan 18 '23 00:01 bartoldeman

VCPI clients allocate memory from XMS only, so EMS needs to report 0 memory, and XMS needs to map identical "physical" memory starting at 0x110000, which would normally be taken by DPMI memory.

I reverted that optimization in #1881. Please let me know if this is what you are looking for - that should allow vpci to work with internal XMS.

Other than that - quite interesting, but could you please describe exactly what does this give us? win95? Also you likely want to ruin the arm port, don't you? :) Or maybe you want to move to tcg, allowing the arm port to provide vcpi too?

stsp avatar Jan 18 '23 07:01 stsp

Have you considered the use of the KVM_CAP_SYNC_REGS, as in #1408 ? Unless I am missing the actual problem you are facing, with SYNC_REGS the aforementioned limitations will go away because you will always have the right registers at hands. Monitor then doesn't have to save/restore them.

stsp avatar Jan 18 '23 09:01 stsp

VGAEMU memory is unprotected for now, so updates aren't reliable, and VGA planar modes don't work.

We still have the vgacpy branch in #1099 Hmm, looks like we have many solutions around, waiting for a problem. :) And here it is, a problem.

stsp avatar Jan 18 '23 09:01 stsp

The page table maps LOW+HMA as usual, then a page at 0x110000 to monitor code

Can you put it somewhere upwards, into the "main_pool"?

stsp avatar Jan 18 '23 09:01 stsp

By dirtying all pages. Can be improved using KVM_GET_DIRTY_LOG.

As in #198 ?

stsp avatar Jan 18 '23 09:01 stsp

At the moment this is mostly a playground to see what is possible, with a VCPI client having full control over the VM, and to test some ideas, perhaps find some bugs, it also allows quick checking how DPMI provider X behaves vs. dosemu2's DPMI implementation. Then some of the ideas could go into devel before this would be merged.

This is for the moment a KVM only thing, nothing against Arm, just a bonus when KVM is there for now.

Sure Windows 95 and various games that need VCPI or flat real mode would be a bonus for sure.

The page table maps LOW+HMA as usual, then a page at 0x110000 to monitor code

Can you put it somewhere upwards, into the "main_pool"?

How do you mean, moving the code and data page from the monitor to two pages allocated from the main pool? I mean the whole monitor can be allocated from the main pool if we like now. Then we'd only have a single mmap for DOS addressable space.

VCPI spec says they must be mapped into the first 4MB (linear), could be in low space (e.g. I could put the VCPI<->VM86 switch code in bios.S), or mapped from somewhere high (which is what I'm doing now, monitor is high)

Note that before VCPI, KVM always identity maps, from guest virtual space to guest physical space. This keeps it consistent with native DPMI. But with VCPI the client has control over page tables so we only have control over physical space except (collaboratively!) for the first 1MB + 64k + up to 3 MB of virtual address space. That's why DPMI had to be disabled in the config, so that extmem is at physical (not linear) 0x110000.

As in https://github.com/dosemu2/dosemu2/pull/198 ? yes

bartoldeman avatar Jan 18 '23 15:01 bartoldeman

That's why DPMI had to be disabled in the config, so that extmem is at physical (not linear) 0x110000.

This part I don't understand. dpmi_base is far above ext_mem. They are quite unrelated. With the reverts I did in a nearby branch, ext_mem is going to be mapped to the physical 0x110000 under KVM by the means of EPT. So what is the problem?

stsp avatar Jan 18 '23 16:01 stsp

ext_mem is aliased to some high space in main_pool, and to 0x110000. Under KVM, both these windows actually represent the physical addresses. And the linear addresses too, if we set up the identity page tables. But I don't see any connection with dpmi, could you please clarify?

stsp avatar Jan 18 '23 16:01 stsp

oh I may be slightly out of date, as I see now that dpmi_base's default was moved from 8MB to 32MB. Which leaves max 31MB for extmem now.

Still for HX, 0x400000 (at 4MB linear) needs to be available for DPMI right, see https://github.com/dosemu2/dosemu2/issues/612? So there's a conflict with extmem (if it uses more than 3MB, and mapped at 0x110000 linear) or is this handled somehow?

bartoldeman avatar Jan 18 '23 16:01 bartoldeman

Yeah, that's right, and that part I forgot to revert. :) Added the missing reverts now. Also I am not sure ext_mem is actually mapped where it should, even with reverts, because MAPPING_INIT_LOWRAM alias still seems to be done w/o ext_mem size in low_mem_init(). But that's trivial to fix. So I suppose perhaps you don't need to disable dpmi fully, just break HX when vcpi switched on?

stsp avatar Jan 18 '23 16:01 stsp

I added the patch that is supposed to map it to kvm's phys addr now for real. :)

stsp avatar Jan 18 '23 17:01 stsp

So there's a conflict with extmem (if it uses more than 3MB, and mapped at 0x110000 linear) or is this handled somehow?

In devel this is handled by mapping ext_mem to higher addresses. Which is the problem for vcpi, as under kvm that "higher address" is a phys address. In xms_rv branch this is handled by lowering the ext_mem size... But in a guest we don't need to map ext_mem to any linear address. So perhaps the better fix would be to leave things as they are in devel and invent non-identity maps for kvm? But that looks too tricky.

stsp avatar Jan 18 '23 17:01 stsp

Well I guess the easiest solution for now would be for you to just integrate the xms_rv branch into yours, w/o pushing it to devel. This will allow you to play with 3Mb of ext_mem, or increase it by the cost of breaking HX, and later come up with some other solution.

stsp avatar Jan 18 '23 17:01 stsp

Hmm, perhaps I've found the simple solution: https://github.com/dosemu2/dosemu2/pull/1881/commits/57c667bf080002ebf1b137a74797c8cf1ac88a61 Why not to just use dpmi_rsv_low as ext_mem?

stsp avatar Jan 18 '23 17:01 stsp

Except that then we shouldn't write-protect reserved area, so https://github.com/dosemu2/dosemu2/pull/1881/commits/1e62ebffec042890e5587d8d75884ae49736e279

stsp avatar Jan 18 '23 18:01 stsp

xms_rv built and I think its quite simple to apply to devel. Would you like to check that it actually helps vcpi?

stsp avatar Jan 18 '23 19:01 stsp

I merged identity mapping for ext_mem, so hope disabling DPMI will not be needed for VCPI.

stsp avatar Jan 20 '23 13:01 stsp

This definitely makes this easier though now I need to introduce config.vcpi obviously :) Will do so in a few days when I have time.

bartoldeman avatar Jan 21 '23 02:01 bartoldeman

Oh would you like to complete the simx86 and fpu activities maybe? Why vcpi can't wait? :)

stsp avatar Jan 21 '23 08:01 stsp

DOOM is playable but a bit slow due to frequent VGA I/O port accesses which would normally go via instremu. Duke Nukem 3D runs but with choppy sound and incomplete screen updates.

Doom runs very slow. Duke 3D actually runs very well for me.

jharrison03 avatar Feb 06 '23 06:02 jharrison03

doom is slow and duke is ok even without this patch. vcpi hardly makes a difference here.

stsp avatar Feb 06 '23 06:02 stsp

So there seems to be some "market" for this actually, eg #875 It is about run286, which would be quite difficult to run on our DPMI impl, even if the reverse-engineered stuff from dosbox exist. It relies on a particular descriptors layout (expects LDT/IDT in certain slots IIRC), and has crippled int21 translation.

Another thing is myjemm - ems extensions. Should be much simpler to implement if anyone ever cared.

With VCPI, both would hopefully just work.

stsp avatar Feb 09 '23 15:02 stsp

Still very much a draft, but a little less restrictive and more functional:

  • $_dpmi no longer needs to be (0) but in that case DOS progs need to be told explicitly to use VCPI if you want them to (e.g. running them explicitly with cwsdpmi.exe (DJGPP) or set DOS16M=11 (DOS4GW))
  • Still needs $_ems=(4) an artificial low value, so EMS int67 is activated but it won't let any memory to be allocated, forcing VCPI apps to allocate via XMS.
  • planar VGA modes work, at least for now, via coalesced IO, a KVM feature which puts mmio/port writes in a buffer which can later be processed. This is much faster than a VMEXIT for every i/o or write but quite a bit slower than instremu (as they are still trapped inside the VM). Using instremu would imply teaching it about paging.
  • works with FDPP now
  • doom/duke3d/jazz work and can be played, but sound is choppy.

bartoldeman avatar Feb 14 '23 13:02 bartoldeman

Interesting. So I suppose, at least from a configuration POV, only $_ems prevents this from merging in? It already works with default $_dpmi setting and with fdpp, so that sounds good. Will it not require lowering $_ems in the future?

stsp avatar Feb 14 '23 14:02 stsp

planar VGA modes work, at least for now, via coalesced IO, a KVM feature which puts mmio/port writes in a buffer which can later be processed.

This is cool but IIRC doom (and most other planar-mode games) read from mmio just as frequent as they write. So batching doesn't actually work, or does it?

stsp avatar Feb 14 '23 14:02 stsp

I'll double check but IIRC writes are much more frequent than reads to planar VGA memory.

I'm working on eliminating the EMS requirement, by implementing interfaces int67 0xde03/de04/de05 properly (they can just get some extmem/XMS, no need to alias map, it's just to get the accounting right, since programs expect to be able to allocate pages via those interfaces if EMS int67 ah=0x42 reports >0 memory).

JEMM/EMM386 have a NOEMS switch which is emulated with $_ems = (4), since EMS pages are 16k this gets rounded to 0 but better not to need that hack with the above.

bartoldeman avatar Feb 14 '23 18:02 bartoldeman

I'll double check but IIRC writes are much more frequent than reads to planar VGA memory.

I was thinking about batching on top of instremu. In theory, if you can create the large enough write batches, then you can submit them to the different thread. But my conclusion was that you need really large batches for this to make any effect, and frequent reading was ruining that idea. But maybe your findings would be different. Overall, finding a way to speed up planar modes, even mildly, would be very cool. Currently, I am afraid, we have the slowest planar mode emulation among any competitors, including dosemu1. :(

stsp avatar Feb 14 '23 18:02 stsp

QEMU with KVM uses coalesced MMIO/PIO for planar VGA from what I could see, though it's likely faster in TCG mode for that. Will still look at that later.

I've implemented the memory allocation functions now; they operate on the memory area at 16M, so there is no overlap with XMS and EXTMEM. I should still add a configuration parameter which forcibly disables VCPI, and deal with overlaps for VCPI/DPMI memory, and add checks ensuring VCPI/DPMI are never running at the same time. But it's getting much closer now.

bartoldeman avatar Feb 15 '23 03:02 bartoldeman

Sound cool, thanks! Can you check that the Origin games are running? This is the primary (perhaps the only) motivator for having vcpi.

stsp avatar Feb 15 '23 04:02 stsp

QEMU with KVM uses coalesced MMIO/PIO for planar VGA from what I could see, though it's likely faster in TCG mode for that.

Possibly, but here planar modes are even more slow with simx86... So with planar modes we suck on all fronts, including jit.

stsp avatar Feb 15 '23 08:02 stsp