litedram
litedram copied to clipboard
Add Reordering support
Ports can now expose banks to user and can allow reordering accesses to the memory.
To implement reordering, we could create a module that would work on two native ports:
- one used by the user with all accesses in order and not exposing the banks.
- one used internally with reordered accesses and exposing the banks.
user_port = LiteDRAMNativePort(..., with_reordering=False)
internal_port = LiteDRAMNativePort(..., with_reordering=True)
class LiteDRAMReordering(Module)
def __init__(self, user_port, internal_port):
[...]
We could implement this scenario:
For writes:
- add bank cmd buffers (only store row/col).
- each bank cmd buffers maintain a in /out count.
- add data ram (min depth = nbanks * depth of cmd buffers)
- redirect write cmd to the proper bank cmd buffer.
- write cmd is accepted if the proper bank buffer is not full.
- when bank cmd buffer accepts the cmd, accept the data and store it at location bank << log2(buffers' depth) + bank in index.
- when bank cmd buffer outputs the cmd, retrieve the data in ram at location bank << log2(buffers' depth) + bank out count and put it in data queue that will be presented to the crossbar.
For reads:
- maintain a global cmd_in and data_out count.
- add bank cmd buffers (only store row/col/cmd_in count).
- add data ram (min depth = nbanks * depth of cmd buffers).
- redirect read cmd to the proper bank cmd buffer.
- read cmd is accepted if the proper bank buffer is not full and if read data corresponding to the same cmd_in count value has been presented to the user (should be cmd_in count + 1 != data_out count).
- when a bank accepts a cmd, put cmd_in outputed by the cmd buffer value in a queue, use this queue to know where to store the next returned data in the data ram.
- use a flip bit in data ram to indicate that data has been updated (flip this bit each time a location is used).
- read data at data_out count location, if bit has flipped, present the data and increment data_out count to read next location.
Work has started in the reordering branch. I'm focusing on optimizing read/write grouping right now which doesn't require an API change.
At low speeds (~800MT/s) row open/close is well hidden with good dispersion among banks so we don't need to re-order based upon activate/precharge commands. The primary bottleneck at this speed is read/write bus transitions.
The reordering branch now has 1 level of read/write reordering. This allows us to turn R->W->R->W into R->R->W->W while allowing us to detect conflicts. Provided you have good dispersion amongst banks this moves us from 20% bus efficiency to 60% which is quite dramatic. I'm implement conflict detection so we don't reorder reads/writes to the same address. This means users of the naive interface can benefit.
https://github.com/enjoy-digital/litedram/pull/55