amaranth-soc icon indicating copy to clipboard operation
amaranth-soc copied to clipboard

Peripheral API design: exposing bus interfaces

Open jfng opened this issue 5 years ago • 7 comments

Peripherals are a currently missing building block from nmigen-soc.

They would provide wrappers to cores by means of a CSR interface (also interrupts, but handling these could be the subject of a separate issue). For example, an AsyncSerialPeripheral wrapper in nmigen-soc would provide access to an AsyncSerial core in nmigen-stdio. Baudrate, RX/TX data, strobes etc. would be accessed through CSRs.

Integration would be straightforward for peripherals that provide nothing more than CSRs:

  • CSRs are gathered behind a csr.Multiplexer, whose bus interface is exposed by the peripheral
  • all peripheral interfaces are gathered behind a single csr.Decoder
  • the csr.Decoder bus interface is bridged to the SoC interconnect

But what about peripherals that also provide a memory interface ? (e.g. DRAM controllers, flash controllers, etc.) I see two possible approaches:

Approach A: exposing two separate bus interfaces for CSRs and memories

CSRs would be handled the same way as described above, but the peripheral would also provide a separate bus interface to access its memories (e.g. WB4). I think LiteX follows a similar approach.

This has the consequence of locating the CSRs and memories of a given peripheral in separate regions of the SoC address space.

pros:

  • lower resource consumption; all the CSRs of the SoC are still pooled behind a single csr.Decoder, and the WB4 interface of a peripheral is directly connected to its logic.

cons:

  • transactions may be reordered if e.g. the WB4 interface sits behind a FIFO, but not the CSR interface.

Approach B: exposing a single bus interface for both CSRs and memories

Instead of two separate interfaces, a memory-capable peripheral would expose a single bus interface like WB4 or AXI4. This has the consequence of locating all the resources of a peripheral in the same address space region.

  • peripherals would have a local wishbone.Decoder, whose bus interface would be exposed
  • memory interfaces would be added to the decoder
  • CSRs would be grouped into banks, each bank would be bridged to the same decoder (e.g. csr.Multiplexer -> WishoneCSRBridge -> wishbone.Decoder)

pros:

  • peripherals with single standard bus interface are easier to integrate when instantiated alone (counterargument: users may prefer just using the bare nmigen-stdio cores instead, if available)
  • the address space layout of a peripheral would be flexible to the point where one could mimick the peripherals of another SoC. This could facilitate porting/reusing drivers.

cons:

  • some layouts may consume significantly more resources, e.g. if many CSR banks are requested. (although I assume that the general case consists of a single CSR bank)

Any thoughts on this ? cc @whitequark @awygle @enjoy-digital and others

jfng avatar Feb 19 '20 15:02 jfng

To get my biases out of the way - I am most concerned about the use case where there is no CPU, and possibly no bus. I believe this case is covered by wrapping nmigen-soc around nmigen-stdio, so I'm not too worried about that, but you should know where I'm coming from.

Approach A seems more flexible to me, in that it can be configured to act like Approach B. With Approach A, I can hook up AXI-Lite to the control port and AXI4 to the data port for AXI SoCs, and just hook everything up to the same WB4 bus for Wishbone SoCs. I believe the downside of this approach can be mitigated by requiring the control and data ports to have matched pipelining delays, or at the very least documenting the difference if one exists so that the SoC integrator can match them if desired.

awygle avatar Feb 19 '20 17:02 awygle

Approach A seems more flexible to me, in that it can be configured to act like Approach B. With Approach A, I can hook up AXI-Lite to the control port and AXI4 to the data port for AXI SoCs, and just hook everything up to the same WB4 bus for Wishbone SoCs. I believe the downside of this approach can be mitigated by requiring the control and data ports to have matched pipelining delays, or at the very least documenting the difference if one exists so that the SoC integrator can match them if desired.

I think you just changed my mind on this! (I was in favor of approach B)

Both of the use-cases I highlighted for Approach B are actually doable with separate CSR and memory interfaces, namely:

  • peripherals with single standard bus interface are easier to integrate when instantiated alone

The CSR bus interface could just be bridged by a parent module to the WB4/AXI4 bus, resulting in a "single standard bus interface".

  • the address space layout of a peripheral would be flexible to the point where one could mimick the peripherals of another SoC. This could facilitate porting/reusing drivers.

Similarly, a parent module could wrap the peripheral and reorganize its address space, and expose whatever layout may be needed in order to reuse a particular driver.

jfng avatar Feb 19 '20 18:02 jfng

So, in the case of peripherals with CSRs, I'm thinking of a csr.Peripheral mixin that would be used like this (without interrupts, for now):

class AsyncSerialPeripheral(csr.Peripheral, Elaboratable):
    def __init__(self, *, rx_depth=16, tx_depth=16, **kwargs):
        super().__init__()

        self._phy     = AsyncSerial(**kwargs)
        self._rx_fifo = SyncFIFO(width=self._phy.rx.data.width, depth=rx_depth)
        self._tx_fifo = SyncFIFO(width=self._phy.tx.data.width, depth=tx_depth)

        self._divisor = self.csr(self._phy.divisor.width, "rw")
        self._rx_data = self.csr(self._phy.rx.data.width, "r")
        self._rx_rdy  = self.csr(1, "r")
        self._tx_data = self.csr(self._phy.tx.data.width, "w")
        self._tx_rdy  = self.csr(1, "r")

        self._bridge  = self.csr_bridge()
        self.csr_bus  = self._bridge.bus

    def elaborate(self, platform):
        m = Module()
        m.submodules.bridge  = self._bridge

        # ...

        return m

For memory interfaces, a separate wishbone.Peripheral mixin would provide:

  • a self.window() method that would return a wishbone.Interface
  • a self.wb_bridge() method that would return a bridge to all the requested windows.

That way, a peripheral that requires both a CSR bus and a WB4 bus would inherit from both csr.Peripheral and wishbone.Peripheral.

Would this be acceptable ?

jfng avatar Feb 19 '20 23:02 jfng

I think that we have to be careful not to limit the structure of the CSR interface.

Having the interface glom all the csr.whatever into a single csr.bus would make a harvard interface difficult.

perahaps a bus instance and the add to this bus interface would work better.

I think @awygle observing that a minimal interface without a CPU or (wishbone|AXI|whatever) interface is important. We should be able to make a nmigen-soc with nothing but 0 or more CSR interfaces.

zignig avatar Feb 20 '20 04:02 zignig

FYI: I'm working on a library I'm calling systemonachip. Here is an example: https://github.com/tannewt/systemonachip/blob/main/systemonachip/peripheral/timer.py#L12

It is based on lambasoc but makes two changes:

  1. Uses data descriptors for CSR definition. These classes then change their behavior based on the bus on the instance. If it's a Record then it produces the csr.Element. If not, it reads it's offset from the memory window. This allows the value to be read from the outside for use in higher level driver functions. This works with the simulator too.
  2. Pass in the bus/memory window into the constructor. This makes them an explicit input and can be used for dual-role classes.

tannewt avatar Aug 10 '20 22:08 tannewt

This issue was discussed in two IRC meetings three years ago, but I forgot to summarize their conclusions.

20/07/20 : https://freenode.irclog.whitequark.org/nmigen/2020-07-20#1595274233-1595276561;

There is consensus for Approach A. PeripheralInfo must be modified to hold the memory map of every bus interface of a peripheral.

While a bus-agnostic API (consisting of memory ports and CSR elements) could automate compatibility with multiple bus protocols, some performance-critical features such as bursts would be hard to abstract over, if not impossible. Feature support would be limited to a common denominator.

27/07/20 : https://freenode.irclog.whitequark.org/nmigen/2020-07-27#1595876880-1595885191;

Considering an hypothetic flash controller peripheral. It has a memory and a CSR element with a "program" bit. Setting this bit has the side-effect of programming the flash storage with the contents of the memory.

Without further assumptions, this interaction is susceptible to data hazards, regardless of how many bus interfaces the peripheral has. Writes may be reordered such that the "program" bit is set before the last word of data reaches its destination.

Memory accesses may be delayed, combined or reordered at every step between the initiator, cache hierarchy, interconnect, and the peripheral:

  • If the initiator is a CPU, changes to memory ordering can be made by both the compiler and the CPU.
  • Memory-like regions would likely be cached by the initiator; writes may be delayed or combined before becoming bus transactions.
  • The interconnect topology, buffering primitives, and arbitration may introduce latencies. These are problematic if the peripheral has multiple bus interfaces.

The detection of memory reorderings from the compiler or the CPU is outside the scope of amaranth-soc. Therefore, adding synchronization primitives to the interconnect or peripheral isn't enough to mitigate them.

To be effective, synchronization needs to be implemented end-to-end. In such cases, the BSP generated by amaranth-soc should provide constraints to the compiler and the CPU's memory controller.

jfng avatar Jun 06 '23 22:06 jfng

Thanks for summarizing this, JF! All of this makes sense to me.

whitequark avatar Jun 07 '23 08:06 whitequark