axi
axi copied to clipboard
AXI Demux could be extended to send requests with same ID to different slaves
With multiple outstanding write requests going to different slaves in axi_demux, the demux waits for the B responses from the first request before sending the second request (AW) to a different slave. This can have a significant performance impact when using slow slaves.
Ideally, the demux would forward all possible outstanding requests (within reason and protocol specifications) to the corresponding slaves, keeping track internally of ID and ordering across different slaves. The responses will need to be stalled according to ordering specifications across the different slaves.
This behavior currently also affects reads.
This behavior was a deliberate design decision I took when writing the module. It was taken to adhere to the AXI4 ordering model (Chapter 6). The previous implementation was interleaving responses from the same ID when they where forwarded to different subordinates.
The two main reasons I choose this were:
- The demux module has no view over the overall system. So the system of keeping order has to be local to the demux.
- Avoid the usage of reorder buffers in general as they would potentially take a lot of space, especially for the read. This module will be used quite often in the overall AXI network so the used area needs to be minimized.
So I took then the decision to stall at the demux subordinate port if there is from the same ID a in-flight transaction to a different manager port than the one requested. I decided against reordering and stalling on the response path because it could lead to stalling or even deadlock issues if the buffer size is not adequately chosen.
The system that is currently in place to handle the ordering is the counter module inside the demux. It should allow to forward transactions with the same ID to the same manager port. Different ID's to different manager ports are forwarded. It should only stall if the same ID goes to a different manager port than the one currently allocated to that ID.
Though what could be an issue here is that to keep the overall counter size down the module only looks at a sub-slice of the transaction ID and treats different IDs as the same! (Thats what the parameter AxiLookBits is doing) I took this design choice again to keep the overall area of the module down. The array of the counters grows exponentially. So it can be that the module stalls transactions even if the ID's are different.
Some ideas to solve this:
- Use a different ID for different target subordinates which do not collide in the counters (Probably quite troublesome to find a good set, especially when multiple
axi_xbarare involved). - Revamp the internal
axi_demux_id_countersto a different data structure so that the ID's are no longer colliding. I'm could see a sort of associative list that keeps track of the in-flight transactions without cutting out the ID to index the array.
Is this issue a question, a bug report, or an enhancement proposal?
If question: The axi_demux documentation states
When the demultiplexer receives two transactions with the same ID and direction (i.e., both read or both write) but targeting two different master ports, it will not accept the second transaction until the first has completed. During this time, the demultiplexer stalls the AR or AW channel, respectively. To determine whether two transactions have the same ID, the
AxiLookBitsleast-significant bits are compared. That parameter can be set to the fullAxiIdWidthto avoid false ID conflicts, or it can be set to a lower value to reduce area and delay at the cost of more false conflicts.
The rationale behind this behavior has been explained above by @WRoenninger. If the documentation is unclear or insufficient, please suggest an enhancement.
If instead this is a bug report, can you please describe how the actual behavior differs from the specified behavior?
If this is an enhancement proposal, I suggest to have a thorough architectural discussion before starting implementation efforts.
This is definitely an enhancement proposal. @WRoenninger's reasoning of course makes sense, but there are some performance implications, e.g. for large dma transfers across interleaved slaves that are far away from the demux. As the demux still performs according to spec this has a low priority, the goal of this issue was to document the performance implications and possibly launch a more detailed architectural discussion. AFAIK no implementation efforts have started.
I agree that there can be situations where the performance of the demux can be improved by letting it issue requests with the same ID to multiple master ports. Due to the AXI ordering model (and as you also commented), this will require a reorder buffer inside the demux, so that the demux returns responses to upstream in request order even though it may get responses out of order from downstream modules. This reorder buffer is not simple to parametrize and quickly requires a lot of area (at least for reads).
AXI specifies that a transaction has no ordering constraint to another transaction with a different ID. With this, the master, which assigns the ID, is responsible for dealing with out-of-order responses. I recommend further exploring that direction and only investigate this issue in more detail when there is sufficient evidence that different IDs are not the way to go. I am happy to discuss this if you'd like.
I have taken the liberty to modify the title so it's more clear that this is a potential enhancement and not a bug.
Hi @andreaskurth @WRoenninger
With multiple outstanding write requests going to different slaves in
axi_demux, the demux waits for the B responses from the first request before sending the second request (AW) to a different slave. This can have a significant performance impact when using slow slaves.Ideally, the demux would forward all possible outstanding requests (within reason and protocol specifications) to the corresponding slaves, keeping track internally of ID and ordering across different slaves. The responses will need to be stalled according to ordering specifications across the different slaves.
This behavior currently also affects reads.
I can't agree @micprog any more. According to the commercial NoC product evolution of ARM from NIC-400 to NI-710AE, there is not mandatory Single slave per ID(a concept from NIC400) for AXI write anymore in the NI-710AE. Instead, introducing an always-on B response reorder buffer for each axi_demux is to improve performance. Additionally, an optional and limited depth of R response reorder buffer is also a choice for AXI Read to make it better~
Notes: the concept: Single slave per ID (from NIC400)
Single slave per ID This ensures that at an ASIB:
All outstanding read transactions with the same ID go the same destination.
All outstanding write transactions with the same ID go the same destination.
When the ASIB receives a transaction:
If it has an ID that does not match any outstanding transactions, it passes the CDAS.
If it has an ID that matches the ID of an outstanding transaction, and the destinations also match, it passes the CDAS.
If it has an ID that matches the ID of an outstanding transaction, and the destinations do not match, it fails the CDAS check and is stalled.
A stalled transaction remains stalled until one of the rules passes.
IP tools automatically detect when this is required. See Additional reading.
Depending on the configured topology and ASIB CDAS scheme, there is still a possibility for a cyclic dependency deadlock because of the AW and W channel ordering rules. This is detected by the IP tool and is indicated by a loop error.
You can resolve this by either changing:
The configuration topology.
All but one of the ASIB CDAS schemes that feed into the loop to a single slave.
A single switch slave interface on the loop to a Single Active Slave (SAS).