rv32emu Preliminary support for MMU emulation

The purpose of this commit is to boot 32-bit RISC-V Linux in the future. The virtual memory scheme to support is Sv32. There are one change to original code base to adapt the MMU: the prototype of riscv_io_t interface needs to be changed. Particularly, add a RISC-V instance(riscv_t) as the first parameter. MMU related callbacks require to access the satp CSR to perform a page table walk during virtual memory translation but satp CSR is stored in RISC-V instance(riscv_t), thus it should have a way to access the satp CSR. The trivial solution is adding RISC-V instance(riscv_t) to the prototype of riscv_io_t interface.

After this change, we can reuse riscv_io_t for system emulation afterward.

The rest of changes are implementing the Sv32 virtual memory scheme. For every memory access, it has to walk through the page table to get the corresponding PTE. Depends on the retrieval of PTE, there are several page faults to be handled if necessary, so there are three exceptions handlers have been introduced which are insn_pgfault, load_pgfault, and store_pgfault and they are used in MMU_CHECK_FAULT. In this commit, the access fault are not handled well since they are related to PMA and PMP and they might not the must to boot 32-bit RISC-V Linux (tested on semu). More PTE, S-mode, M-mode CSR helper macro are introduced as well.

Related: #310

May 12 '24 16:05 ChinYikMing

This PR is not fully ready to be merged since testing is not yet fully designed. PR earlier to get some feedbacks for further design.

May 12 '24 16:05 ChinYikMing

The initial design mentioned in here does not fully consider the CSR such as satp CSR needs to be accessed during MMU translation. During implementation, the interface shall be changed to adapt MMU translation.

May 12 '24 16:05 ChinYikMing

This PR is not fully ready to be merged since testing is not yet fully designed. PR earlier to get some feedbacks for further design.

How can we test the MMU specific operations?

May 13 '24 04:05 jserv

This PR is not fully ready to be merged since testing is not yet fully designed. PR earlier to get some feedbacks for further design.

How can we test the MMU specific operations?

The testing idea can be break down to following steps:

Creating a simple userspace application and kernel supervisor
Starts executing with the simple kernel supervisor. Read/write CSR register to install exception vector table to specific address for traps and root page table for MMU translation.
After all CSR stuffs are done, switch to user mode and execute userspace application. At this point, I would like to design some scenario to testing all three types of page fault (instruction, load, store page fault). For every userspace memory access, dump the page table could be beneficial for verification or debugging.

If I am at the wrong path, please correct me.

It take times to design this testing. So, I would try to support other peripherals emulation at the same time such as PLIC.

May 13 '24 15:05 ChinYikMing

Creating a simple userspace application and kernel supervisor

Starts executing with the simple kernel supervisor. Read/write CSR register to install exception vector table to specific address for traps and root page table for MMU translation.

After all CSR stuffs are done, switch to user mode and execute userspace application. At this point, I would like to design some scenario to testing all three types of page fault (instruction, load, store page fault). For every userspace memory access, dump the page table could be beneficial for verification or debugging.

The above sound great. I expect the lean and reasonably straightforward approach as following:

May 13 '24 17:05 jserv

During block emulation, I think the instructions are executed sequentially until block ends. As such, when a page fault exception is generated during block emulation, the RISC-V core has to jump to the corresponding exception handler. The potential problem is that even thought the PC could be updated in a exception handler, but the block to emulate is not updated, and this cause the page fault cannot be handled properly.

I have tested calling rv_step after updated the PC to exception handler to resolve the potential problem. It works in gdb but not outside the gdb. Any recommend way for this potential problem?

Jun 03 '24 06:06 ChinYikMing

During block emulation, I think the instructions are executed sequentially until block ends. As such, when a page fault exception is generated during block emulation, the RISC-V core has to jump to the corresponding exception handler. The potential problem is that even thought the PC could be updated in a exception handler, but the block to emulate is not updated, and this cause the page fault cannot be handled properly.

I have tested calling rv_step after updated the PC to exception handler to resolve the potential problem. It works in gdb but not outside the gdb. Any recommend way for this potential problem?

Can you provide a minimal reproducible example so that @qwe661234 can verify if the current block chaining is functioning as expected?

Jun 03 '24 06:06 jserv

Steps to reproduce the VM test:

make ENABLE_SYSTEM=1
Go to the tests/system directory, run make
build/rv32emu tests/system/vm.elf

Some output would look like this:

delegated to supervisor
fault addr: 0x4
new PC: 0x800000b0
next insn addr: 0x4, next insn: 0xfe010113
delegated to supervisor
fault addr: 0x8
new PC: 0x800000b0
next insn addr: 0x8, next insn: 0x112e23
delegated to supervisor
fault addr: 0xc
new PC: 0x800000b0
next insn addr: 0xc, next insn: 0x812c23
delegated to supervisor
fault addr: 0x10
new PC: 0x800000b0
next insn addr: 0x10, next insn: 0x2010413
delegated to supervisor
fault addr: 0x14
new PC: 0x800000b0
next insn addr: 0x14, next insn: 0x6400793
delegated to supervisor
fault addr: 0x18
new PC: 0x800000b0
next insn addr: 0x18, next insn: 0xfef42623
delegated to supervisor
fault addr: 0x1c
new PC: 0x800000b0
next insn addr: 0x1c, next insn: 0xc800793
delegated to supervisor
fault addr: 0x20
new PC: 0x800000b0
next insn addr: 0x20, next insn: 0xfef42423
delegated to supervisor
fault addr: 0x24
new PC: 0x800000b0
next insn addr: 0x24, next insn: 0xfec42703
delegated to supervisor
fault addr: 0x28
new PC: 0x800000b0
next insn addr: 0x28, next insn: 0xfe842783
delegated to supervisor
fault addr: 0x2c
new PC: 0x800000b0
next insn addr: 0x2c, next insn: 0xf707b3
delegated to supervisor
fault addr: 0x30
new PC: 0x800000b0
next insn addr: 0x30, next insn: 0xfef42223
delegated to supervisor
fault addr: 0x34
new PC: 0x800000b0
next insn addr: 0x34, next insn: 0x100513
delegated to supervisor
fault addr: 0x38
new PC: 0x800000b0
next insn addr: 0x38, next insn: 0x80000097
delegated to supervisor
fault addr: 0x3c
new PC: 0x800000b0
next insn addr: 0x3c, next insn: 0x64080e7
next insn addr: 0x8000009c, next insn: 0x5d00893
next insn addr: 0x800000a0, next insn: 0x73
a0: 1
exit syscall called
inferior exit code 1

Notice that the user space code starts at address "0x4" and the address "0x800000b0" is the supervisor exception handler entry.

When instruction fetch fault occurs at address "0x4", the PC is updated to "0x800000b0" but the next instruction is still from address "0x4" and I think it should be from address "0x800000b0". The relevant information is as follows:

delegated to supervisor
fault addr: 0x4
new PC: 0x800000b0
next insn addr: 0x4, next insn: 0xfe010113

The consequent instruction address ( "0x8", "0xc", ... ) face the same problem.

Jun 03 '24 09:06 ChinYikMing

Steps to reproduce the VM test:

make ENABLE_SYSTEM=1

Go to the tests/system directory, run make

build/rv32emu tests/system/vm.elf

At first glance, it appears that the MMU was not set in tests/system/vm.c, and exceptions are delegated to S-mode. Could you show the expected flow for exception handling?

Jun 03 '24 09:06 jserv

ext insn addr: 0x4

Could you provide your printf format? I want to map the fields in riscv_t.

Jun 03 '24 09:06 qwe661234

Steps to reproduce the VM test:

make ENABLE_SYSTEM=1

Go to the tests/system directory, run make

build/rv32emu tests/system/vm.elf

At first glance, it appears that the MMU was not set in tests/system/vm.c, and exceptions are delegated to S-mode. Could you show the expected flow for exception handling?

Please ignore the "vm.c" file. The MMU setup is done in "vm_setup.c".

/* Enable paging */
uintptr_t satp_val =((pte_t) &l1pt >> PG_SHIFT) | SV32_MODE;
write_csr(satp, satp_val);

The expected flow for exception handling is that:

Search the correspond PTE in page table via mmu_walk
If the PTE is not found, a correspond page fault exception is generated. In this case, it should be instruction fetch page fault.
RISC-V core tends to check if the exception is delegated to S-mode or not. If yes, then set the PC to stvec else set the PC to mtvec. Base mode or Vectored mode depends on the implementation. In this case, the exception is delegated to S-mode so PC is set to stvec. The sepc CSR saves the next instruction to be executed which used by sret instruction to resume the execution seamlessly when page fault is handled. The page fault handler will map 4KiB data during the handling.
The instruction address "0x8", "0xc", ... should not cause instruction page fault since 4KiB data are mapped.

Jun 03 '24 10:06 ChinYikMing

ext insn addr: 0x4

Could you provide your printf format? I want to map the fields in riscv_t.

Sure. Here it is: "next insn addr: 0x%x, next insn: 0x%x\n".

Jun 03 '24 10:06 ChinYikMing

ext insn addr: 0x4

Could you provide your printf format? I want to map the fields in riscv_t.

Sure. Here it is: "next insn addr: 0x%x, next insn: 0x%x\n".

Is the variable rv->PC?

Jun 03 '24 10:06 qwe661234

ext insn addr: 0x4

Could you provide your printf format? I want to map the fields in riscv_t.

Sure. Here it is: "next insn addr: 0x%x, next insn: 0x%x\n".

Is the variable rv->PC?

printf("next insn addr: 0x%x, next insn: 0x%x\n", block->pc_end, insn);. Please see 691 lines in src/emulate.c

Jun 03 '24 10:06 ChinYikMing

When instruction fetch fault occurs at address "0x4", the PC is updated to "0x800000b0" but the next instruction is still from address "0x4" and I think it should be from address "0x800000b0". The relevant information is as follows:

I want to ensure that your expected exeuction is 0x4 -> 0x800000b0 -> .... -> 0x8 -> 0x800000b0 -> ..., or others? Because ifetch address is basd on block->pc_end, if you don't modify this value, it will fetch next instruction like "0x8", "0xc ..".

Jun 03 '24 13:06 qwe661234

When instruction fetch fault occurs at address "0x4", the PC is updated to "0x800000b0" but the next instruction is still from address "0x4" and I think it should be from address "0x800000b0". The relevant information is as follows:

I want to ensure that your expected exeuction is 0x4 -> 0x800000b0 -> .... -> 0x8 -> 0x800000b0 -> ..., or others? Because ifetch address is basd on block->pc_end, if you don't modify this value, it will fetch next instruction like "0x8", "0xc ..".

My expected execution flow is 0x4 -> 0x800000b0 -> 0x800000b4 -> 0x800000b8 -> ... -> until reach "sret" and return to 0x4. Shall I modify the value to create the execution flow? In other words, modify the PC in the exception handler is not enough to create the execution flow?

Jun 03 '24 13:06 ChinYikMing

Shall I modify the value to create the execution flow? In other words, modify the PC in the exception handler is not enough to create the execution flow?

Yes, you need to modify the block->pc_end, and the next instruction to be fetched will become 0x800000b0. You can see the block_translate, it only reads start PC in block->pc_start = block->pc_end = rv->PC;. Therefore, the modified rv->PC in exception handler does not influence the next instruction in block_translate.

Jun 03 '24 13:06 qwe661234

Shall I modify the value to create the execution flow? In other words, modify the PC in the exception handler is not enough to create the execution flow?

Yes, you need to modify the block->pc_end, and the next instruction to be fetched will become 0x800000b0. You can see the block_translate, it only reads start PC in block->pc_start = block->pc_end = rv->PC;. Therefore, the modified rv->PC in exception handler does not influence the next instruction in block_translate.

Thanks for clarify! But I found that If I try to find_or_translate a block after updating the PC in the exception handler, the corresponding block will be generated and might no need to modify the block->pc_end.

In line 1477:

if (!pte && rv->csr_satp) { /* not found, then map it in handler */    \
            rv_inter_except_##pgfault(rv, addr);                               \
            printf("fault addr: 0x%x\n", addr); \
            printf("new PC: 0x%x\n", rv->PC); \
+          block_t *block = block_find_or_translate(rv); \
+          assert(block); \
            return true;                                                       \
        }                                                                      \

This is the new block generated logs:

delegated to supervisor
fault addr: 0x4
new PC: 0x800000b0
next insn addr: 0x800000b0, next insn: 0x14051573
next insn addr: 0x800000b4, next insn: 0x152023
next insn addr: 0x800000b8, next insn: 0x252223
next insn addr: 0x800000bc, next insn: 0x352423
next insn addr: 0x800000c0, next insn: 0x452623
next insn addr: 0x800000c4, next insn: 0x552823
next insn addr: 0x800000c8, next insn: 0x652a23
next insn addr: 0x800000cc, next insn: 0x752c23
next insn addr: 0x800000d0, next insn: 0x852e23
next insn addr: 0x800000d4, next insn: 0x2952023
next insn addr: 0x800000d8, next insn: 0x2b52423
next insn addr: 0x800000dc, next insn: 0x2c52623
next insn addr: 0x800000e0, next insn: 0x2d52823
next insn addr: 0x800000e4, next insn: 0x2e52a23
next insn addr: 0x800000e8, next insn: 0x2f52c23
next insn addr: 0x800000ec, next insn: 0x3052e23
next insn addr: 0x800000f0, next insn: 0x5152023
next insn addr: 0x800000f4, next insn: 0x5252223
next insn addr: 0x800000f8, next insn: 0x5352423
next insn addr: 0x800000fc, next insn: 0x5452623
next insn addr: 0x80000100, next insn: 0x5552823
next insn addr: 0x80000104, next insn: 0x5652a23
next insn addr: 0x80000108, next insn: 0x5752c23
next insn addr: 0x8000010c, next insn: 0x5852e23
next insn addr: 0x80000110, next insn: 0x7952023
next insn addr: 0x80000114, next insn: 0x7a52223
next insn addr: 0x80000118, next insn: 0x7b52423
next insn addr: 0x8000011c, next insn: 0x7c52623
next insn addr: 0x80000120, next insn: 0x7d52823
next insn addr: 0x80000124, next insn: 0x7452a23
next insn addr: 0x80000128, next insn: 0x7f52c23
next insn addr: 0x8000012c, next insn: 0x140512f3
next insn addr: 0x80000130, next insn: 0x2552223
next insn addr: 0x80000134, next insn: 0x100022f3
next insn addr: 0x80000138, next insn: 0x6552e23
next insn addr: 0x8000013c, next insn: 0x141022f3
next insn addr: 0x80000140, next insn: 0x8552023
next insn addr: 0x80000144, next insn: 0x143022f3
next insn addr: 0x80000148, next insn: 0x8552223
next insn addr: 0x8000014c, next insn: 0x142022f3
next insn addr: 0x80000150, next insn: 0x8552423
next insn addr: 0x80000154, next insn: 0x3990006f
next insn addr: 0x4, next insn: 0xfe010113

Although the new block is generated, the current emulation block is still the block that starts with 0x4 -> 0x8 -> 0xc -> ... . In other words, the rv_step still emulate the old block. In order to execute the newly generated block, shall we make some changes?

Jun 03 '24 14:06 ChinYikMing

Actually, my idea is modified block->pc_end to 0x800000b0, but we cannot pass block to the exception handler. This issue is difficult because the exception is occured during translation. However, I want to ask why this page fault exception is handled after the execution of 0x04. The page fault occurs when the instruction is fetched. Typically, the operating system handles the page fault first, retrieves the necessary page, and then fetches the instruction. Therefore, my expected execution flow is 0x800000b0 -> 0x800000b4 -> 0x800000b8 -> ... -> 0x4 -> 0x8 -> ...

Jun 03 '24 15:06 qwe661234

Actually, my idea is modified block->pc_end to 0x800000b0, but we cannot pass block to the exception handler. This issue is difficult because the exception is occured during translation. However, I want to ask why this page fault exception is handled after the execution of 0x04. The page fault occurs when the instruction is fetched. Typically, the operating system handles the page fault first, retrieves the necessary page, and then fetches the instruction. Therefore, my expected execution flow is 0x800000b0 -> 0x800000b4 -> 0x800000b8 -> ... -> 0x4 -> 0x8 -> ...

If the execution flow like this 0x800000b0 -> 0x800000b4 -> 0x800000b8 -> ... -> 0x4 -> 0x8 -> ..., there will no instruction fetch page fault generated since the instruction are loaded to RAM. In order to emulation the fetich page fault, I think the flow should be 0x4 -> 0x800000b0 -> 0x800000b4 -> 0x800000b8 -> ... -> page fault handled -> 0x4 -> 0x8 -> .... Could you comment this? @jserv

Jun 04 '24 00:06 ChinYikMing

If the execution flow like this 0x800000b0 -> 0x800000b4 -> 0x800000b8 -> ... -> 0x4 -> 0x8 -> ..., there will no instruction fetch page fault generated since the instruction are loaded to RAM. In order to emulation the fetich page fault, I think the flow should be `0x4 -> 0x800000b0 -> 0x800000b4 -> 0x800000b8 -> ... -> page fault handled -> 0x4 -> 0x8 -> ...

Inside not only the interpreter mode but also the JIT compiler, we have to escape from the chained blocks if an exception is raised. Check QEMU's Translator Internals for details.

Jun 04 '24 03:06 jserv

If the execution flow like this 0x800000b0 -> 0x800000b4 -> 0x800000b8 -> ... -> 0x4 -> 0x8 -> ..., there will no instruction fetch page fault generated since the instruction are loaded to RAM. In order to emulation the fetich page fault, I think the flow should be `0x4 -> 0x800000b0 -> 0x800000b4 -> 0x800000b8 -> ... -> page fault handled -> 0x4 -> 0x8 -> ...

Inside not only the interpreter mode but also the JIT compiler, we have to escape from the chained blocks if an exception is raised. Check QEMU's Translator Internals for details.

Maybe we could apply some mechanism to suspend the current emulating block (might need to save some states) and jump to execute the exception handler block. After that, resume the previous emulating block perhaps addressing this problem. I am wondering whether cause any side effects of this approach.

Jun 05 '24 05:06 ChinYikMing

With the Translator Internals, I also see some source codes of QEMU, I realized that some helper functions are required to support handling the exception which are raise_mmu_exception and cpu_loop_exit_restore. By the way, the latter is defined in cpu-common.h, so I think it should be a generic abstract function for multiple kinds of CPU.

In this section, we can see that if the MMU generates fault, the former is called then latter is called. We could possibly needs to provide such similar mechanism to support exception handling.

Jun 05 '24 12:06 ChinYikMing

In this section, we can see that if the MMU generates fault, the former is called then latter is called. We could possibly need to provide such similar mechanism to support exception handling.

Agree. Before merging this MMU work, we should refine the existing exception handling for hardware-aware behavior.

Jun 06 '24 04:06 jserv

Steps to reproduce the VM test:

make ENABLE_SYSTEM=1

Go to the tests/system directory, run make

build/rv32emu tests/system/vm.elf

Rerun the steps above would see some outputs like below:

...
exit syscall called
inferior exit code 0

which means that the userspace application's instruction page fault (caused by address 0x4) has been handled and mapped 4 KiB for the corresponding PTE. Therefore, the subsequent instructions would not cause any instruction page fault before exceeding 4 KiB.

Note: you may objdump the user space application main section via: riscv32-unknown-elf-objdump -d -j .text.main tests/system/vm.elf

Jun 15 '24 18:06 ChinYikMing

Steps to reproduce the VM test:

make ENABLE_SYSTEM=1

Go to the tests/system directory, run make

build/rv32emu tests/system/vm.elf

Rerun the steps above would see some outputs like below:
...
exit syscall called
inferior exit code 0
which means that the userspace application's instruction page fault (caused by address 0x4) has been handled and mapped 4 KiB for the corresponding PTE. Therefore, the subsequent instructions would not cause any instruction page fault before exceeding 4 KiB.

Note: you may objdump the user space application main section via: riscv32-unknown-elf-objdump -d -j .text.main tests/system/vm.elf

Rerun the steps and will see the output of MMU test suite.

INSTRUCTION FETCH PAGE FAULT TEST PASSED!
LOAD PAGE FAULT TEST PASSED!
STORE PAGE FAULT TEST PASSED!
inferior exit code 0

Jun 21 '24 23:06 ChinYikMing

RISC-V Architecture Test complains:

ERROR | rv32emu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/privilege/src/ecall.S : - : Failed

Jun 23 '24 10:06 jserv

Quote from the comment of esp32-running-linux

The TLB is a high-level cache of the page table, which stores the most recently used translations and makes them quickly and efficiently accessible. Instead of accessing the page table in main memory, the processor first checks the TLB to see if the translation of the virtual address is already stored there. If it is, the translation is used directly, without the need to access the page table in main memory. This reduces memory access latency and increases system performance.

esp32-running-linux provides a minimalist implementation of RV32 + MMU capable of running Linux kernel, and its MMU/TLB is worth checking.

Aug 21 '24 19:08 jserv

Update CI pipeline to include system emulation tests.

Done.

Oct 21 '24 17:10 ChinYikMing

I defer to @vacantron for confirmation.

Oct 23 '24 12:10 jserv

rv32emu rv32emu copied to clipboard

Preliminary support for MMU emulation

rv32emu
rv32emu copied to clipboard