ibex
ibex copied to clipboard
RISC-V A atomic extension support in Ibex
Hello, my team is considering using Ibex with a secure embedded operating system (SeL4) and one of the requirements we need to check off the list to support it is the RISC-V A atomic operations extension. This extension adds load-reserve/store-conditional and a handful of read-modify-write transactions.
As I understanding it from my brief discussion with Philipp Wagner and Greg Chadwick on Zulip here this isn't something that has been planned out yet, but there is some interest. We're happy to do the legwork to add and test this feature, but the end goal is lowRISC acceptance so we want to make sure that whatever work we do matches up with your expectations.
Greg mentioned that implementation without modifying the memory interface probably wouldn't be an issue for merge, but I had originally been considering extending the interface from using a one-bit read/write signal to a four-bit command so that the read-modify-write transactions could be handled downstream via TL-UH or AXI5's atomic opcodes. Of course this extension would be designed to be optional, so when disabled the upper 3 bits of that command would be optimized out along with any hardware that depends upon them.
My Environment
N/A, in early planning.
EDA tool and version: N/A.
Operating system: SeL4.
Version of the Ibex source code: Latest.
Hi @jtate-pu, have you done any further work on this?
As a first step a proposal for the modified bus protocol would be good. The extra bits to give an operation rather than just read/write shouldn't be a big problem. From a quick look at the A spec I don't think we'd need any extra interface signals beyond that? (In particular there's no compare-and-swap operation where you'd need to send out two operands).
What were you thoughts on implementation load reserved and store conditional? If you wanted an implementation in Ibex itself we'll need some kind of new snoop/notification channel so Ibex can be told by the system if the reservation has been invalidated.
Indeed the simplest implementation would just to be say Ibex doesn't ever actually execute the atomics, they just look much like other memory operations with the memory system expected to provide the implementation. You could build a small block that uses the Ibex interface protocol which would provide a basic atomics implementation when coupled to so downstream memory under some assumptions (only the block can talk to the downstream memory and nothing upstream will cache the data). This would allow people to build simple Ibex systems that use atomics without much effort (e.g. some kind of 2-core Ibex system). Those with more complex requirements will need something more sophisticated and will translating to tile-link/axi anyway.
Hi @jtate-pu, have you done any further work on this?
Hi @GregAC, not a whole lot yet but I do have some other thoughts on the topic I'd like to share.
As a first step a proposal for the modified bus protocol would be good. The extra bits to give an operation rather than just read/write shouldn't be a big problem.
Sounds good. Does lowRISC have a particular proposal documentation format that I can follow?
From a quick look at the A spec I don't think we'd need any extra interface signals beyond that? (In particular there's no compare-and-swap operation where you'd need to send out two operands).
Indeed the simplest implementation would just to be say Ibex doesn't ever actually execute the atomics, they just look much like other memory operations with the memory system expected to provide the implementation. You could build a small block that uses the Ibex interface protocol which would provide a basic atomics implementation when coupled to so downstream memory under some assumptions (only the block can talk to the downstream memory and nothing upstream will cache the data). This would allow people to build simple Ibex systems that use atomics without much effort (e.g. some kind of 2-core Ibex system). Those with more complex requirements will need something more sophisticated and will translating to tile-link/axi anyway.
That's my reading of it as well, and I think is the least complex to verify. The way I envision it, the atomic read-modify-write transactions can simply behave as a load variant in the pipeline where the outgoing write data field is set to the value from a register instead of being cleared to zero as a normal load would have had it.
What were you thoughts on implementation load reserved and store conditional? If you wanted an implementation in Ibex itself we'll need some kind of new snoop/notification channel so Ibex can be told by the system if the reservation has been invalidated.
In a strict uniprocessor implementation with the RMW transactions implemented via a state machine in the Ibex/TL-UL bus bridge logic, a situation I expect to be fairly common, if we add the additional hard requirement that other IP blocks in the system can never interact with the core's LR reservations, support for these instructions should be trivial to add. The spec states that any subsequent regular loads and stores from the same thread that issued a LR instruction will not impact the outstanding reservation, and the ISA already has a requirement that any multitasking operating system aware of the A extension must clear an outstanding reservation on context switch by always issuing a SC to a sacrificial memory location. Since we don't need to narrow the address scope at all to avoid collisions with other transactors' memory accesses, a single flip-flop of additional state would be sufficient to indicate that a LR instruction has been issued and that all memory has been reserved.
For systems that can't meet the above restrictions, due to being a multi-processor implementation or an implementation where other IP blocks in the system may interact with software-defined synchronization primitives, I'm very concerned about transaction reordering resulting in a store being snuck in by an optimizing memory controller after a LR has already been issued, or a store before a SC that has already been deemed to have passed. I don't think a simple processor-to-processor side-channel snoop/notification system would be sufficient in the general case. Without resorting to having some kind of common data port between a cluster of processors and sharing a single ID, we'd need some other means to ensure transactions during critical moments cannot be reordered, and to my knowledge there's no way to express that in TL-UL/TL-UH.
I suspect that the easiest way to implement LR/SC in a multi-processor environment, and does in fact seem to be the way that Rocket does it, would be to utilize TL-C and make use of the cacheline ownership management functionality to ensure that no other processor has written to the cacheline allocated with LR, but that really seems like a lot of overhead for a processor as small as Ibex that doesn't even have a data cache. I'm just not sure if it would be worth it.