dosemu2 rfe: alias DPMI memory

As was pointed out here: https://github.com/stsp/dosemu2/pull/1057#issuecomment-601659609 we may need to alias DPMI memory if we are serious about DPMI-jit (which IMO would be great to have). 20 years ago there was a limit on sysv shm, which was by default something like 32Mb (don't remember exactly, could be even less), so I just converted DPMI to anonymous mapping. But these days we do not use sysv shm, and the limits are different. By default we have 128Mb of DPMI mem. Mapping 128Mb of shared mem on 64bit arches is not a big deal.

Mar 20 '20 15:03 stsp

I didn't keep track of the recent improvements. How does this affect performance/compatibility?

Mar 21 '20 08:03 jschwartzenberg

What exactly? Bart fixed some compatibility problems with jit. This ticket is about fixing a few more w/o sacrificing the performance.

Mar 21 '20 09:03 stsp

Do you know specific software that's affected? Maybe it's also what fixed DN2?

Mar 21 '20 09:03 jschwartzenberg

It did.

Mar 21 '20 09:03 stsp

I'm not sure if it helps, there are some pros and cons. I doubt it will help performance since page faults are quite rare for DPMI already as code and data are much less often on the same page than in vm86. It may help code clarity a bit though as you now know that LINEAR2UNIX gives a writable address (unless it's VGA)

Another alternative would be a faultless JIT without mprotect. You absolutely need mprotect for JIT vm86 + native DPMI, but for full JIT it could be done completely in software.

Mar 22 '20 02:03 bartoldeman

I just want to find some consistent solution. Previously, because the faults are rare, I wanted to just use e_invalidate manually in a few places. But you want to build them into a macros, and the macros that are also used for lowmem, and not only by DPMI subsystem. If the invalidation cases are rare, I think this is not the best approach. Either DPMI-specific macros (len arg to SEL_ADR_W() + read/write accessors) or a faultless jit are fine with me, if the problem is too small to alias the DPMI mem just for that.

Mar 22 '20 13:03 stsp

Off-topic: https://www.youtube.com/watch?v=KRLm4mKhUVU Though I think it is some kind of trolling, not a real review of dosemu2. :)

Mar 22 '20 14:03 stsp

I don't really want to build it into the macros, ideally I'd let SEL_ADR_X() return a dosaddr_t and not use any pointers into DOS space, instead use only functions (read_byte(), memcpy_* etc, where only dos2linux.c uses LINEAR2UNIX etc. and nowhere else). The point is really that e_invalidate is quick and a no-op if the memory is already unprotected or is aliased, which is >99% of all cases (ie. unprotected) for DPMI memory. I should probably implement a few counters to see how many exactly :)

Mar 22 '20 15:03 bartoldeman

But read_byte(), memcpy_* and all that - use aliased pointers normally. If you force them to always unprotect, then these cases where unprotect is not needed (because the protection can stay, not because there was no protection at all) will became slower. Perhaps we then need a decision to stop using aliased space w/o unprotecting first. But if we can use aliased space w/o unprotecting, then I suppose coding unprotect into an access macros is not good.

Mar 22 '20 16:03 stsp

No I don't want to force them to unprotect, that is indeed not good! Just do what memcpy_2dos (which takes a dosaddr_t, the aliased pointer is hidden inside the function) does now, ie.

Check bitmap, if unprotected, write as usual (alias'ed or not doesn't matter), return
If protected then a. check via another bitmap if it hits code, if so, invalidate that code b. vga memory, call vga functions, return c. if aliased lowmem, write to alias, return d. if DPMI memory, unprotect and write, return

Now I am arguing that 2.d. is a fairly rare case, and we now catch it with dosemu_error if we fail to call e_invalidate.

The main issue with the current approach of simply using the pointers into aliased memory and writing directly to them without checking is that you don't catch 2.a. and 2.b.

For 2.a. that can be problematic if you write data over code and the JIT doesn't know the code hasn't been invalidated (a leak most likely); 2.b. is less likely to happen since you don't really want to put filenames into VGA memory but you never know what crazy things DOS programs do..

Mar 22 '20 18:03 bartoldeman

But is also either you hit code or not. I.e. its not just one case. The fact that it is treated as one, bothers me. Unnecessary unprotects.

Mar 22 '20 21:03 stsp

Of course perhaps you mean that 2.d w/o code hit almost never happens? (insert 2.d after word "but" in my prev msg)

Mar 22 '20 22:03 stsp

Correct I mean the latter since pages in dpmi progs typically don't share code and data... Of course there is some self modifying code. I'll get some stats later.

Mar 22 '20 23:03 bartoldeman

Maybe you meant the later, but your patches suggest you had data and code mixed in. Or was it not the case?

Mar 22 '20 23:03 stsp

Correct there was code and data mixed in, and that faulted about 10 times. That's nothing compared to the 1000s of other faults that happen.. but like I said I'll need more stats, also for the case when dpmi code faults and the cpatch unprotects.

Mar 23 '20 02:03 bartoldeman

So what are you going to do with linear2unix then? Completely ban? I think I much rather need to see the size of the patch for that, than any fault stats. :)

Mar 23 '20 09:03 stsp

The patch would be enormous indeed :(

I'm warming up more and more to the idea of the fault-less JIT. I experimented a bit more with using the CPatches unconditionally and dosemu boots just as fast as before (or even slightly faster).

For DPMI the case of Causeway is a little extreme since that one indeed mixes code and data on a single page (I checked the source code) and produces ~20,000 page faults, which now means code regeneration, and a delay of 0.8 seconds (about 40 microseconds per fault). That is completely eliminated if this is done in software.

And that's without much optimization on the software checks, there is an interesting blog here: http://www.emulators.com/docs/nx08_stlb.htm essentially this technique can be used to implement a cache to quickly check if a 256-byte section of memory is safe to write to, e.g. for dwords it would like:

table_index = (addr >> 8) & 0xff; /* this is the hash */
if (((pagetbl[table_index].addr_page ^ (addr + 3)) & 0xFFFF0300) == 0)
    *(uint32_t)(pagetbl[table_index].alias | (addr & 0xff))  = x;
else
    /* expensive way, check for VGA, update pagetbl etc. */

where for e.g. address 0x12345678 the addr_page field (if hit) at index 0x56 has the value 0x12345600 if it's writable memory and the value 0x12345600 ^ 0x200 = 0x12345400 if it's not writable. The alias field then contains LINEAR2UNIX(DOSADDR_REL(0x12345600)).

this will take a bit of time to properly implement though but not needing to deal with page faults is tempting.. One would still need to protect code and use the alias for writes for jit+native or jit+kvm though but not for full jit.

Mar 24 '20 12:03 bartoldeman

Yes, the attempt to ban linear2unix will likely be comparable to mem_base32 macrosifying, which took a few years. But at least in case of mem_base32 everyone was confident this is the right thing to do, while removing linear2unix just for an experiment... not sure.

As for faultless jit - I also fail to see the logic. If you claim there almost no faults in dpmi code except for cw extender, then how would that not became a pessimization?

OTOH I do not feel like aliasing the dpmi space is the best and only approach possible. At least its simple, so my current thinking is either that or don't touch. :)

Mar 24 '20 13:03 stsp

So given that cw extender uses code/data mix (and I really think other "people" do too, just not so frequently as in real mode), I suppose even if you ban linear2unix, you still can't count on the fact that the fault==code_hit. In which case you still need an alias.

So it looks like we have 3 possible strategies:

alias DPMI space. Optionally ban linear2unix - leaving it in won't hurt in this case, at least won't hurt more than it already does. Pros: simple. consistent with what we have for lowmem. Cons: wasting 128Mb of shm, which may not be worth the problem we try to solve.
always unprotect, assuming fault==code_hit. Pros: technically appealing Cons: pissimization when fault!=code_hit. not consistent with what we have now for lowmem. needs to either ban linear2unix (2.a, difficult), or only ban dpmi-specific variations of it, like SEL_ADR_X() (2.b, simple).
fault-less jit Pros: portable Cons: difficult. slow. I think it is slow because the jit faults in DPMI should be rare, regardless of whether its a code hit or not. So you trade the rare faults to increased logic in generated memory access code.

Now when this all is written out, its much simpler to think about that. I think it would be very cool to have 3 just as another emulation option, for portability. And its not something that is needed right now. I also think that implementing 2.b doesn't hurt, as later one can also implement 1 (or something else!), and 2.b will take an advantage of it by no longer assuming fault==code_hit. What to do with linear2unix in general, can be decided at any later stage.

So... 2.b? :)

Mar 24 '20 15:03 stsp

I am happy about 2.b. for sure, however do note that SEL_ADR_X is also used for read-addresses and sometimes it uses strcmp on those. Introducing SEL_ADR_W and then using memcpy_2dos on the write cases would fix that (and we need to invalidate the code if it's writing on top of code, also if page protection is switched off in 3)

However do note that the vast majority of faults come from the JIT itself, within DPMI that code is CPatch'ed and then the Cpatch unprotects the page.

As for 3 it's actually a very simple patch, and it's surprisingly performing (it just fails for Windows because the page faults on the LDT aren't correctly handled any more, same for general DPMI page faults as tested in test-i386.exe). Sure it adds some cycles for every write but those are snowed under by everything else that happens. But I'm all for doing it optionally, any ideas for a runtime option? $_cpu_emu = "softmmu" or "softJIT"? (in #564 I proposed to use "JIT" and "simulated" as possible values, but I don't think I ever got round to that...)

Mar 24 '20 21:03 bartoldeman

I am happy about 2.b. for sure, however do note that SEL_ADR_X is also used for read-addresses and sometimes it uses strcmp on those.

Yes, I think the strategy should be to make SEL_ADR_X() return dosaddr_t (which you suggested initially), rather than to introduce SEL_ADR_W() or alike. I think I am more and more convincing myself that fully switching to dosaddr_t is a future-proof move, and so why not to start from dpmi and get back to linear2unix sometime later. The reason is simple: linear2unix (and so SEL_ADR_X()), even working with aliased space, may leave you with an unnoticed code hit sooner or later. But the biggest problem for me with fully accepting that view, is also simple: you are safe with KVM or native or vm86. So changing such a core thing for something that is even not enabled by default. :( And it will likely give the slight performance hit to kvm/native/vm86, and will definitely complicate the code (you won't be able to use the pointer to the struct in DOS mem). So I think the good compromise is to switch DPMI stuff to dosaddr_t and see, and perhaps not do the same to linear2unix. :) Which is 2.b.

However do note that the vast majority of faults come from the JIT itself

You mean, !InCompiledCode? What would that be?

any ideas for a runtime option?

Perhaps another option, like $_cpu_jit_mode? The problem is, we already have a separate set of bugs for different simx86 modes. There would be even more. :)

Mar 24 '20 22:03 stsp

Bart, what about the fault-less sim (not jit)? That would allow us to work under windows-10. sigcontext is not properly supported there, so all that siglongjump() trick doesn't work.

Mar 25 '20 18:03 stsp

That should already work except for Windows 3.x LDT writes and DPMI page faults (fairly uncommon but tested in test-i386.exe). Basically the write routines in the sim need to be modified to check for those addresses (and a TLB-like structure as explained above would help there too)

Mar 25 '20 19:03 bartoldeman

That should already work except for Windows 3.x LDT writes and DPMI page faults (fairly uncommon but tested in test-i386.exe).

And vgaemu?

The last time we discussed that, you were against such idea: https://github.com/stsp/dosemu2/pull/149#discussion_r56493644 But I think its time to put that forward, as in fact dosemu can now (somewhat) work on windows.

Mar 25 '20 22:03 stsp

Off-topic: I created the dosemu2 organization at github and invited people there. I have zero idea what does that mean, just wanted to see what happens. You may want to accept or deny the invitation and I don't know what will change when you do. :)

Mar 30 '20 21:03 stsp

And I also created dosemu2 project. So we now have repository, organization and the project... hmm.

Mar 30 '20 21:03 stsp

And now we seem to have new URLs: instead of github.com/stsp/* it is now github.com/dosemu2/* So in particular the dosemu2 repository is now: https://github.com/dosemu2/dosemu2 Oh crap, what have I done... But it looks like the former URLs work too. There is some kind of redirect.

Mar 30 '20 21:03 stsp

So I am no longer a project owner, but just a collaborator. :) How cool.

Mar 30 '20 21:03 stsp

Faultless JIT works with vgaemu and should work with LDTs too, but not page faults, although I have a proof of concept stashed somewhere for that case, after which I returned to the simulator.

As for my former comment from 2016, the "TLB" will revert any slowdowns that I added to the simulator now, a quick test gave it a 99.7% hit rate when booting DOS.

Mar 30 '20 22:03 bartoldeman

Faultless JIT works with vgaemu and should work with LDTs too, but not page faults, although I have a proof of concept stashed somewhere for that case, after which I returned to the simulator.

How does faultless jit handle the code self-modifications? "softtlb" tricks in the compiled code?

As for my former comment from 2016, the "TLB" will revert any slowdowns that I added to the simulator now, a quick test gave it a 99.7% hit rate when booting DOS.

What kind of slowdown have you added to simulator? A few checks for vga/ldt mem is not accounted as a slowdown I suppose?

Mar 30 '20 23:03 stsp

dosemu2 dosemu2 copied to clipboard

rfe: alias DPMI memory

dosemu2
dosemu2 copied to clipboard