chisel icon indicating copy to clipboard operation
chisel copied to clipboard

Support for two-cycle read latency on SRAMs

Open tymcauley opened this issue 3 years ago • 5 comments

Type of issue: feature request

Impact: API addition (no impact on existing code) | API modification

Development Phase: request

Other information

If the current behavior is a bug, please provide the steps to reproduce the problem:

What is the current behavior?

What is the expected behavior?

Please tell us about your environment:

What is the use case for changing the behavior?

The SyncReadMem construct lets users specify an SRAM with one-cycle write and read latencies. However, some memory compilers let you build a register bank into the read path of an SRAM (which is usually a very tight timing path), resulting in an SRAM with a two-cycle read latency. In Chisel, you cannot specify an SRAM which matches that behavior.

FIRRTL supports SRAMs with arbitrary read/write latencies. I believe this is the current code path for going from SyncReadMem instances to FIRRTL memory objects:

  • When users create a SyncReadMem instance, a DefSeqMemory is created. This gets converted to a FIRRTL CDefMemory object here: https://github.com/chipsalliance/chisel3/blob/df1b4aaf06cbca60bb48c3697d478dcdba48af36/core/src/main/scala/chisel3/internal/firrtl/Converter.scala#L130-L131
  • Those CDefMemorys are converted to DefMemorys in the RemoveCHIRRTL pass, where the readLatency and writeLatency parameters are set to 1: https://github.com/chipsalliance/firrtl/blob/c00a4ebb0608f9ef98729e9b610a2678be2bc4fd/src/main/scala/firrtl/passes/RemoveCHIRRTL.scala#L133-L144

So, this would seem to be a feature of FIRRTL that Chisel doesn't currently expose. I don't know what the right API would be for this, or what range of read (and maybe write) latencies Chisel should expose.

Since FIRRTL associates the read/write latencies with the DefMemory object, I don't think we can have different ports with different latencies, so it wouldn't make sense to specify latency in the read() or write() methods of SyncReadMem. So, we could add readLatency/writeLatency constructor parameters to SyncReadMem which default to 1 and are required to be 1 or greater.

tymcauley avatar Dec 21 '21 04:12 tymcauley

While FIRRTL does technically support memories with >1 cycle of read latency, they are (usually) lowered to combination-read memories in a fairly simplistic manner.

While the perennial argument for adding higher-level constructs to FIRRTL (or in this case, the "dialect" of FIRRTL that Chisel actually uses) is the power of preserving design intent, multi-cycle memories need a fair amount of extra microarchitectural parameters (register slice placement, collision behavior) to allow a decent level of control over behaviour that's somewhat coupled with the design of fairly critical timing paths. The current workaround of using RegNext + memory does offer that granular control but prevents black-boxing these memories, so it seems like it would be worth considering whether to add support for automatically black-boxing this space of memories before adding them to Chisel.

I'm not sure whether there would be momentum for enhancing the vlsi_mem_gen tool (or whatever its current equivalent is) to support this space of memories. Unfortunately, I'm not familiar with any other extant interfaces between FIRRTL and ASIC memory compilers, but maybe other users are?

It's worth noting that if a user is willing to implement a custom layer to tie a FIRRTL flow into a memory compiler, it's totally possible to accommodate both functional models (with arbitrary collision / randomization / enable behavior) and automatic blackboxing by defining multi-cycle memories as modules rather than first-class memories. This also sidesteps the issue of the somewhat non-intuitive behavioral memory API.

albert-magyar avatar Dec 21 '21 06:12 albert-magyar

@tymcauley maybe you could describe some more details about your use case. How exactly do you plan to integrate the SRAM macros? Are there any special pins or parameters for your SRAMs that need to be specified but do not map to anything in Chisel/firrtl?

ekiwi avatar Dec 21 '21 16:12 ekiwi

Sorry for the long delay on this!

We're currently using Chipyard, so I was hoping to use Barstools MacroCompiler to map these Chisel SRAMs to foundry SRAM Verilog models. And @albert-magyar you're exactly right about the SyncReadMem-plus-RegNext solution: while that would model this SRAM's functionality correctly, we wouldn't be able to map that to the desired foundry SRAM model.

These SRAMs with the integrated output flops can have an output-enable pin, which effectively clock-gates the output flops.

It's very possible this problem isn't going to be very easy to solve with a native Chisel representation, but I'd love to document whatever the eventual solution is for future users. I'm also curious if this might have already been solved for FPGA-based flows, since those RAMs can often have output flops (sometimes several cycles worth) integrated into them.

@albert-magyar I'm curious what you mean by "tying a FIRRTL flow into a memory compiler". What is the idea there?

tymcauley avatar Jan 04 '22 22:01 tymcauley

@tymcauley Do you know if chipyard used the ReplSeqMem transform to do the blackboxing? That one is currently limited to a small sub-set of supported firrtl memories:

i.e. rw, w + r (read, write 1 cycle delay) and read-under-write "undefined."

src: https://github.com/chipsalliance/firrtl/blob/ebad877af9d9fe43a92f64f8b63ca8904834cc4d/src/main/scala/firrtl/passes/memlib/ToMemIR.scala#L18

Could be potentially extended though, we just need someone who can prototype that flow and tell us how to best interface with the foundry SRAM compilers. Most people working on firrtl and chisel (including myself) do not actually have access to any of the backend tools.

ekiwi avatar Jan 04 '22 22:01 ekiwi

I'm not totally sure about that. I can see that the ReplSeqMem transform runs during the FIRRTL-compiler stage in Chipyard builds, but then the MacroCompiler transform from Barstools is also run on the resulting output: https://github.com/ucb-bar/chipyard/blob/534fde1c0f46a29933e12e23f40d54afc48f4820/common.mk#L158-L173

tymcauley avatar Feb 10 '22 21:02 tymcauley