Basic Virtual Memory Implementation Fixes & Improvements
This PR addresses some previous feedback, most notably:
- Set-associative TLB (machine::TLB): Implements a set-associative Translation Lookaside Buffer (TLB) frontend over physical memory, handling virtual to physical translation, flush, and replacement policy.
- Pluggable Replacement Policies (machine::TLBPolicy): Abstract TLB replacement policy interface & implementations (RAND, LRU, LFU, PLRU) for set-associative tables.
- SV32 Page-Table Walker (machine::PageTableWalker): Performs multi-level page-table walks (SV32) in memory to resolve a virtual address to a physical one.
- Sv32Pte Bitfield Helpers (sv32.h): SV32-specific definitions: page-table entry (PTE) bitfields, shifts/masks, and PTE to physical address helpers.
- VirtualAddress (virtual_address.h): Lightweight VirtualAddress wrapper offering raw access, alignment checks, arithmetic, and comparisons.
- Add supervisor CSRs and sstatus handling: supervisor CSRs (sstatus, stvec, sscratch, sepc, scause, stval, satp) and a write handler that presents sstatus as a masked view of mstatus so supervisor-visible bits stay in sync.
- Store current privilege level in CoreState: tracking of the hart's current privilege level in CoreState so exception/return handling and visualization can read/update it from the central CoreState structure.
Tests:
- Add SV32 page-table + TLB integration tests: a set of small assembly tests that exercise the SV32 page-table walker, SATP enablement and the new TLB code. The tests create a root page table and map a virtual page at 0xC4000000, then exercise several scenarios. The tests verify page-table walker behaviour, SATP switching and TLB caching/flush logic. Tests were written based on the consultation.
UI Components:
- Show current privilege level in core state view:
- Virtual memory configuration to NewDialog:
- TLB visualization and statistics dock:
- VM toggle and "As CPU" memory access view:
I am getting this weird zoom.
Notice that address sanitizer is failing in CI.
@jdupak thanks for review of interfacing to the memory model architecture.
From my side, the changes to the processor pipeline diagram has been applied directly to the SVG files (src/gui/windows/coreview/schemas), but current design uses DRAW.IO source (extras/core_graphics) as the authoritative source of the pipeline visualization and SVGs are generated from this file. So the commit with SVG change should include extras/core_graphics/diagram.drawio change as well or extras/core_graphics/diagram.drawio change should be commit before SVG files regeneration commit. In long term, I would lean to single SVG file with tags for conditional rendering, but we have not got to that state yet and current solution implemented by @jdupak is based on DRAW.IO and exports controlled by tagging (some documentation there docs/developer/coreview-graphics/using-drawio-diagram.md).
For memory view, I would not complicate it with Show virtual checkbox. I would use only switching between As CPU (VMA), Cached and Raw.
@ppisa @jdupak Thank you for your detailed feedback. I appreciate it and have made some changes based on your review. I would be grateful for any further feedback.
I pushed some slight edits. Barring the issue with new tests not being run I am fine with merging this.
I am going through the code. I have one overall remark, that there are lot of formatting changes included in functional changes. I am not reluctant to formatting changes even that I think that sometimes formatting left by human to align for example some case lines into columns etc. has some value. But formatting changes unrelated to the functional changes make review harder. So I would keep with patches as they are but I would suggest to separate formatting, even over all later modified files in series, separate from functional changes.
I am do not like is_mmio_region() and bypass_mmio() concept. The peripherals accesses should go through regular address translation. It is responsibility of the OS to map regions related to I/O into virtual address space of kernel and or even user application, i.e. for mmap() like accesses.
As for enabling cache for accesses there is a hack in the QtRvSim which enforces next uncached region
Cache::Cache
uncached_start(0xf0000000_addr)
uncached_last(0xfffffffe_addr)
In the longer term, cacheability should be controlled from page tables. But PBMT (Page-Based Memory Types) are supported only for Sv39 and bigger translation configurations, see Chapter 14. "Svpbmt" Extension for Page-Based Memory Types
| Mode | Value | Requested Memory Attributes |
|---|---|---|
| PMA | 0 | None |
| NC | 1 | Non-cacheable, idempotent, weakly-ordered (RVWMO), main memory |
| IO | 2 | Non-cacheable, non-idempotent, strongly-ordered (I/O ordering), I/O |
| - | 3 | Reserved for future standard use |
But the physical region marked to skip caching in cache implementation (current state) should be enough for now.
Not so critical for now, but should be solved in the longer time perspective. SRET can be executed even in M mode. So the type of the return should be propagated to the control_state->exception_return in the Core::memory(const ExecuteInterstage &dt). It is question if to add signal which goes through all stages (more readable) or to use bit from instruction for local decode of the type. MRET should not be allowed in system mode. In general, I think that current version does not mark system level instructions and access to the system and machine mode CSRs as invalid in U mode. So some masking would be required on the decode level in future. Some more flags needs to be added into enum InstructionFlags to allow that checking and flags_to_check and it should then be updated on mode transition.
The behavior of xRET instructions is described in 3.1.6.1. Privilege and Global Interrupt-Enable Stack in mstatus register. When SRET is executed in M mode then it executes the same as in the S mode but it
should clear MPRV=0. This is to allow emulate some system level operations in machine level code.
It seems that TLBs are updated from the start of the system. The TLB and its updates should be enable only when root register is set. And they should not be updated in M mode at all.
There is one actual issue from CI: you cannot use ftruncate. It fails compilation on Win.
There is one actual issue from CI: you cannot use ftruncate. It fails compilation on Win.
Never mind, this is broken on master. I will fix that. It does not block this PR.
@nemecad notice that I force pushed your branch - there is zero diff at the end but all spurious format changes should be gone now. My apologies for introducing them.
CC @ppisa should be now easier to review
Then there is another option and to define type AddressWithMode which would combine and inherit from Address and AccessMode and that way it can be passed through front-end hierarchy with minimal modifications and some automatic promotion is made it can be even assigned by Address. @jdupak do you have opinion about this? But it is questionable what should be defined as AccessMode in such case, so it is better if that causes errors. Because from point of memory viewers it is best to have the highest privilege but then it automatically disables translation. So probably current mode should be copied into AddressWithMode from core state in memory view. But that is only relevant in MEM_ACC_AS_CPU, in other cases M mode is OK a cache disable is not required because it is controlled by through cache options etc...
When I think about CRS access privileges, then it is question if the register number to privilege should be resolved in Instruction::flags_alu_op_mem_ctl at all. The standard defines, that unimplemented CSR registers access should result in unimplemented instruction exception. This means that this exception should be reported directly by control_state->read in Core::decode and or control_state->write in Core::memory. It can be done directly by read/write function through signal or by return value. It is already implemented by C++ exception in
RegisterValue ControlState::read(Address address, PrivilegeLevel current_priv) const
void ControlState::write(Address address, RegisterValue value, PrivilegeLevel current_priv)
size_t ControlState::get_register_internal_id(Address address)
It should be changed to emulated core exception in the corresponding instruction. It can be done by returning information about CSR access error to decode and memory stages and converted to exception there or it can be solved by caching of C++ exception there and converting it to exception in matching instruction.
When the code is rebased and some prototype of AddressWithMode or other change is prepared, we can meet and discuss that together to find the right solution for CSR and finalize privilege effects in TLB.
Then there is another option and to define type
AddressWithModewhich would combine and inherit fromAddressandAccessModeand that way it can be passed through front-end hierarchy with minimal modifications and some automatic promotion is made it can be even assigned byAddress. @jdupak do you have opinion about this? But it is questionable what should be defined asAccessModein such case, so it is better if that causes errors. Because from point of memory viewers it is best to have the highest privilege but then it automatically disables translation. So probably current mode should be copied intoAddressWithModefrom core state in memory view. But that is only relevant inMEM_ACC_AS_CPU, in other cases M mode is OK a cache disable is not required because it is controlled by through cache options etc...
Can you elaborate why is this needed? I did not follow the whole discussion.
Thanks for rebase
Can you elaborate why is this needed? I did not follow the whole discussion.
The TLB translation has to be skipped for M mode. In addition, the translation needs to be done with ASID matching the core ASID and for sv39 and sv48 can even result in request to disable caching for given address which should be propagated to the cache component in the frontend path. The current proposed solution for mode is to keep copy of the core state (ASID and mode) in each TLB which requires their updates. Because this state influences memory accesses, the propagation of the mode and ASID has been moved to the memory ctl operations but because it is not propagated through hierarchy with address together now, there is virtual method which caches this CTL information and updates state, at this moment it is after the access which is incorrect anyway, and requires type casting and cannot reach lower levels of the hierarchy.
On the other hand, propagation of the mode of access together with address is natural and it has been done even in historical CPUs with external MMU (see FC0,1,2 signals on the slides 17 and 18 of my presentation m68k Classic CISC Architecture). When the mode and ASID is passed together with address, then there is no need for ASID and mode state copy in TLBs. It field could be kept there for keeping information for visualization of the last translation, but not for actual translation. If we consider today hyper-threaded CPUs with simultaneous execution of multiple threads, then they have to pass thread number and mode together from Load/Store unit to keep track which access is for which thread or hart in RISC-V case to use correct TLB entries. So I think that this would simplify the code and make simulator more true to real CPU implementations. I have already pointed some problems of the current solution in the above code review.