chisel
chisel copied to clipboard
Support for two-cycle read latency on SRAMs
Type of issue: feature request
Impact: API addition (no impact on existing code) | API modification
Development Phase: request
Other information
If the current behavior is a bug, please provide the steps to reproduce the problem:
What is the current behavior?
What is the expected behavior?
Please tell us about your environment:
What is the use case for changing the behavior?
The SyncReadMem
construct lets users specify an SRAM with one-cycle write and read latencies. However, some memory compilers let you build a register bank into the read path of an SRAM (which is usually a very tight timing path), resulting in an SRAM with a two-cycle read latency. In Chisel, you cannot specify an SRAM which matches that behavior.
FIRRTL supports SRAMs with arbitrary read/write latencies. I believe this is the current code path for going from SyncReadMem
instances to FIRRTL memory objects:
- When users create a
SyncReadMem
instance, aDefSeqMemory
is created. This gets converted to a FIRRTLCDefMemory
object here: https://github.com/chipsalliance/chisel3/blob/df1b4aaf06cbca60bb48c3697d478dcdba48af36/core/src/main/scala/chisel3/internal/firrtl/Converter.scala#L130-L131 - Those
CDefMemory
s are converted toDefMemory
s in theRemoveCHIRRTL
pass, where thereadLatency
andwriteLatency
parameters are set to 1: https://github.com/chipsalliance/firrtl/blob/c00a4ebb0608f9ef98729e9b610a2678be2bc4fd/src/main/scala/firrtl/passes/RemoveCHIRRTL.scala#L133-L144
So, this would seem to be a feature of FIRRTL that Chisel doesn't currently expose. I don't know what the right API would be for this, or what range of read (and maybe write) latencies Chisel should expose.
Since FIRRTL associates the read/write latencies with the DefMemory
object, I don't think we can have different ports with different latencies, so it wouldn't make sense to specify latency in the read()
or write()
methods of SyncReadMem
. So, we could add readLatency
/writeLatency
constructor parameters to SyncReadMem
which default to 1 and are require
d to be 1 or greater.
While FIRRTL does technically support memories with >1 cycle of read latency, they are (usually) lowered to combination-read memories in a fairly simplistic manner.
While the perennial argument for adding higher-level constructs to FIRRTL (or in this case, the "dialect" of FIRRTL that Chisel actually uses) is the power of preserving design intent, multi-cycle memories need a fair amount of extra microarchitectural parameters (register slice placement, collision behavior) to allow a decent level of control over behaviour that's somewhat coupled with the design of fairly critical timing paths. The current workaround of using RegNext + memory does offer that granular control but prevents black-boxing these memories, so it seems like it would be worth considering whether to add support for automatically black-boxing this space of memories before adding them to Chisel.
I'm not sure whether there would be momentum for enhancing the vlsi_mem_gen
tool (or whatever its current equivalent is) to support this space of memories. Unfortunately, I'm not familiar with any other extant interfaces between FIRRTL and ASIC memory compilers, but maybe other users are?
It's worth noting that if a user is willing to implement a custom layer to tie a FIRRTL flow into a memory compiler, it's totally possible to accommodate both functional models (with arbitrary collision / randomization / enable behavior) and automatic blackboxing by defining multi-cycle memories as modules rather than first-class memories. This also sidesteps the issue of the somewhat non-intuitive behavioral memory API.
@tymcauley maybe you could describe some more details about your use case. How exactly do you plan to integrate the SRAM macros? Are there any special pins or parameters for your SRAMs that need to be specified but do not map to anything in Chisel/firrtl?
Sorry for the long delay on this!
We're currently using Chipyard, so I was hoping to use Barstools MacroCompiler
to map these Chisel SRAMs to foundry SRAM Verilog models. And @albert-magyar you're exactly right about the SyncReadMem
-plus-RegNext
solution: while that would model this SRAM's functionality correctly, we wouldn't be able to map that to the desired foundry SRAM model.
These SRAMs with the integrated output flops can have an output-enable pin, which effectively clock-gates the output flops.
It's very possible this problem isn't going to be very easy to solve with a native Chisel representation, but I'd love to document whatever the eventual solution is for future users. I'm also curious if this might have already been solved for FPGA-based flows, since those RAMs can often have output flops (sometimes several cycles worth) integrated into them.
@albert-magyar I'm curious what you mean by "tying a FIRRTL flow into a memory compiler". What is the idea there?
@tymcauley Do you know if chipyard used the ReplSeqMem
transform to do the blackboxing? That one is currently limited to a small sub-set of supported firrtl memories:
i.e. rw, w + r (read, write 1 cycle delay) and read-under-write "undefined."
src: https://github.com/chipsalliance/firrtl/blob/ebad877af9d9fe43a92f64f8b63ca8904834cc4d/src/main/scala/firrtl/passes/memlib/ToMemIR.scala#L18
Could be potentially extended though, we just need someone who can prototype that flow and tell us how to best interface with the foundry SRAM compilers. Most people working on firrtl and chisel (including myself) do not actually have access to any of the backend tools.
I'm not totally sure about that. I can see that the ReplSeqMem
transform runs during the FIRRTL-compiler stage in Chipyard builds, but then the MacroCompiler
transform from Barstools is also run on the resulting output: https://github.com/ucb-bar/chipyard/blob/534fde1c0f46a29933e12e23f40d54afc48f4820/common.mk#L158-L173