Einstein
Einstein copied to clipboard
Eliminate the MMU
On this page, Matthias outlines his overall plan for how the MMU emulation could be eliminated:
https://github.com/pguyot/Einstein/wiki/Development-Strategy
I'd love to help work on this, and have been studying how the MMU emulation works, but I'm still a newbie.
What would really help is if you could give me one concrete example of the process of:
- Identifying a code section that causes allocation of an MMU page
- Patching it so it doesn't
I know this is a lot of grunt work, but I'm good at grunt work once I know what it is I'm looking for, and what steps I need to follow. A specific beginning-to-end example would be so helpful to me.
(Unless your plan for retargeting all the ROM code eliminates the need for doing this? In which case walk me through a ROM retargeting example. :))
As you know, the MMU works by translating virtual addresses into physical addresses. The translation is performed through tables that can be modified by the OS itself.
A naive implementation consists in emulating exactly what the hardware does: for every read or write operation, read the proper values in the tables and perform the translation. The result of the translation can also be a fault, i.e. the operation is forbidden or there is no associated physical address.
The translation code is here: https://github.com/pguyot/Einstein/blob/master/Emulator/TMMU.cp#L155
This is no superfluous feature of NewtonOS which uses MMU like most operating systems:
- as a way to protect some parts of the code to supervisor mode. However, this is quite primitive compared to process isolation in modern operating systems, and protection mostly concerns the first page of the ROM (at physical address 0) and the interrupt vectors;
- as a way to interact with hardware, typically PCMCIA cards, including the quite complex mechanism of handling cards that are physically removed but yet the OS is halted when asking you to insert it back;
- as a way to implement system patches, the jump tables are patched with MMU translation (system patches are RAM pages ingenuously mapped at several virtual addresses each to modify the jump tables);
- as a way to provide active memory, including with the stack: when the stack grows beyond allocated memory, a fault occurs during the memory access (could be within a single instruction execution that writes several words), and then a physical page is allocated and execution resumes. I do not think, however, that NewtonOS can swap active pages to a storage like most operating system does;
- with packages, which are "mapped" in RAM. I believe active packages are mapped to virtual addresses, and the relevant chunks are read and decompressed on demand, generating faults if not available, and cached out when physical RAM is required elsewhere. The same mechanism exists in most current operating systems (i.e. when you mmap a file in Unix, including on a compressed file system).
The translation process is very expensive indeed. There are three paths to optimize this:
- optimize all translation logic by various means. Typically, most translations succeed and are always performed identically. This is the path followed so far in Einstein. Very early in the development of the emulator, pages for execution were translated only once (whenever the processor tries to execute code at a given virtual address, the translation is performed for the whole page). This cached translation is of course invalidated when the MMU table is modified (or the physical page is modified). Einstein also optimizes successive identical translations, i.e. when reading/writing several words at once within a single instruction. The llvm branch is optimizing this further by using some code analysis to avoid redoing translation when reading stuff that was written in the same function when the function was not interrupted by a fault (i.e. we could assume the MMU tables did not change).
- execute translation by the host operating system itself, using the host operating system's MMU. This trick is used by several emulators. However, I ruled this out back in the days because the page size of the host operating system (a 32 bits PowerPC Mac I believe) was too large to emulate small pages of the ARM (the tiniest 1KB pages are used with system patches) considering the amount of RAM I had. Today, this could be solved on 64 bits CPUs by allocating larger amount of data or on hosts with 1KB page sizes (e.g. ARM hosts). I think it could be done in userland by using mmap by requesting a specific virtual address and handling faults. Typically, on my Mac, page size is 4KB and we could pre-allocate 4 times the amount of addressable RAM of NewtonOS, i.e. 16GB with mmap, especially considering most space is not accessed (typically PCMCIA cards). In fact, I just wrote a small prototype of how this works here: https://gist.github.com/pguyot/0abecd6df1c59f13384440a45bd21385
- avoid all MMU code by replacing all this as Matthias suggests. This is a very long road IMHO.
Hi Paul,
Thanks for the lengthy description. It was a very interesting read.
First, I do like the mmap approach very much. It's not so much the ability to cause signals for unmapped memory, but simply the ability to map the known virtual layout to 0x100000000, for example, and make reading and writing virtual memory very fast (by simply or'ing the address with 0x100000000). Since we only map a few megabytes of physical memory, we could even create a second page that holds permissions per block.
Secondly, we have disassembled so much of the ROM by now. Maybe a different/better/coupled approach could be to intercept all calls that modify the MMU Lookup tables, and add code that creates a parallel table that can be interpreted much faster by the emulator, instead of bit-shifting and and'ing integers around.