calyx Add more memory primitives

trafficstars

Just wanted to start some discussion about adding more memory primitives to Calyx. I will be putting development effort in here, but want some feedback before I really get started. The main additions I want to make are memories with multiple ports (simple dual port, true dual port, etc.), memories with variable latency, and memories that map to different primitives (BRAM, LUTRAM, UltraRAM).

Obviously dual port memories need separate primitives as they have different port configurations, but it would be good to limit the number of new primitives somehow. It seems like bad form to have a separate primitive for every combination of address count, port semantics, latency, and primitive.

Looking forward to any ideas on how to best implement this.

Jun 15 '22 18:06 andrewb1999

Thanks for starting this @andrewb1999! I think short term it makes sense to add these things as individual primitives. I'm going to loop in @sampsyo and @EclecticGriffin on the discussion here too but I think in general, I'm expecting the following flow:

The frontend generates a bunch of primitive definitions for the verilog implementations it will generate for the memories.
The Calyx program imports those and optimizes the designs against the interface
Post-calyx lowering, verilog implementations get linked together

Unfortunately, this still runs into the problem of defining "one primitive for every combination of address count, port semantics, etc." However, at least this way, the process does not committing a billion primitives into the repo and instead only generates them on demand.

There is also @sampsyo proposal for Modules in Calyx that could be a possible solution but will require more work: https://github.com/cucapra/calyx/discussions/419

Jun 16 '22 15:06 rachitnigam

This is really exciting! I would love to chat more about this; it seems potentially really useful.

The vision exactly as @andrewb1999 & @rachitnigam lay it out here sounds perfect to me. To summarize, the idea is that we will have a trillion Calyx-exposed Verilog primitives for every combination of parameters. This would be intensely painful to code up by hand, so instead this will be a generator. You specify the number of ports, the memory geometry, latencies, etc., and it generates a Calyx declaration and Verilog implementation for you. We'll come up with a way of specifying an entire range of such parameters to generate a whole swath of memory primitives at once—corresponding, for example, to all the memory configuration available on a given family of FPGAs.

@rachitnigam also makes a really good point about the connection to #419. This generator will be useful without that "modules" concept, but it would be a perfect use case for it in some hypothetical future after the basic version works.

I also want to bring up another farther-future idea that could build on this foundation: interfaces to off-chip memories. We could conceivably use a similar framework to this one to expose DRAM, HBM, etc. within Calyx. But obviously no need to worry about that for now; just covering on-chip memories (BRAMs, LUTRAMs, and UltraRAMs) is plenty for this stage.

Jun 18 '22 18:06 sampsyo

Sounds great. A few more questions I have:

Where should this generator be implemented? Within the Calyx Rust source?
Is this a "JIT" generator or an "AOT" generator? i.e. should the generator be called when Calyx encounters a memory that can't be met by one of the existing memory primitives or should it generate all of these ahead of time and store them as a library in the repo?

Jun 18 '22 21:06 andrewb1999

I think the "JIT" technique would work best, especially because primitive definitions aren't anything special and can live in the same file as Calyx components. We can generate all the primitive definitions and the corresponding verilog file on the side.

One other thing that occurs to me is that each program will have a particular set of memories it defines with particular WIDTH and # of ports, etc. I wonder if it would be possible to represent each such memory as a plain Calyx component. Differently said, it would be worth figuring out what the minimal set of primitives needed for building such memories would be. In the case where each program truly needs a different, parameterizable memory, we will need to use primitive definitions but if not, we can use plain component definitions

Jun 19 '22 17:06 rachitnigam

Indeed, good questions. Building off of @rachitnigam's answer, here's one plausible way to draw a roadmap:

Start with the AOT version, and include a "name mangling" scheme for representing a given memory configuration as a string. For example, maybe the command-line invocation memgen --bram --width 16 --size 1024 --ports 2 produces the Calyx declaration and Verilog implementation for a 2-port BRAM with 1024 16-bit elements. The primitive could get the mangled name mem_bram_16_1024_2 or somesuch. Client code can rely on this stable name mangling to know what to refer to when instantiating their memories.
In a next phase, it will be easy(ish) to "soup up" the AOT version with some program analysis for autoamtion—as long as the name mangling scheme admits "detangling." That is, we could write a tool that analyzes a Calyx program and sees what primitives it uses that are called mem_* and then generates the primitives accordingly: it parses the parameters out of the mem_bram_16_1024_2 string, for example, and invokes memgen as above. This could even be packaged up into a Calyx pass for maximum convenience.
In the long term, this style could inform the general pattern for "generator libraries" as envisioned in #419. The idea would be to let Calyx programs use rich, non-mangled primitive declarations like mem::bram[16, 1024, 2] or whatever and then following the above strategy to obtain implementations.

Anyway, up to you of course! But that somewhat decoupled generator tool thing could be a useful way to keep the problem simple at first…

As far as where the code lives, anything is 100% fine with me, but we'd be happy to put it in this monorepo!

Jun 20 '22 20:06 sampsyo

Thanks @sampsyo! I was actually in the middle of typing out some more questions but that answers a lot of them. That plan sounds like a reasonable method to move forward for now.

I also thought some about @rachitnigam's point. I think latencies can be added by a component that wraps some generated primitive and adds std_reg where necessary. I think vivado will infer when these registers can be moved into the BRAM hardware itself when applicable. If we know exactly how the synthesis tool converts multidimensional addresses, we could also implement that this conversion to single dimension memories in Calyx.

I think it makes sense to want to implement as much as possible in Calyx, but I wonder if this will increase the complexity unnecessarily (need to generate a primitive and then a component that wraps it). I also worry that these components that wrap primitives will be too fragile and depend on the specific synthesis tool being used. We already have to worry about fragile inference semantics within the verilog, but adding another layer seems like another thing that could break the inference.

Jun 20 '22 21:06 andrewb1999

Very much in agreement with this:

I think it makes sense to want to implement as much as possible in Calyx, but I wonder if this will increase the complexity unnecessarily

That is, I'd err on the side of putting stuff in Calyx-land, except when that seems onerous and annoying, in which case practicality prevails.

Jun 20 '22 22:06 sampsyo

@andrewb1999 Anything we need to do In the repository for this issue?

Jun 29 '22 12:06 rachitnigam

Here is a prototype of the memory primitive generation stuff: https://github.com/andrewb1999/calyx-memgen-prototype. I haven't implemented everything yet, specifically rams with a latency greater than 1 and multidimensional rams, but this should be enough to test simple dual port semantics.

I think the primary thing left here before dual port memories work fully is supporting multiple static paths through a component/primitive. It's unrealistic to have combinational reads from a memory so we need a way to describe the read_latency such that calyx can optimize these into a static fsm (extremely important for performance reasons).

For now I'll try to write some tests for these memories and ensure everything works properly.

Jul 07 '22 21:07 andrewb1999

Also, is there some way for fud to include external verilog? (i.e. the verilog definition of the primitive) If not, that's something we should look at adding.

Jul 07 '22 21:07 andrewb1999

Awesome!!

Also, is there some way for fud to include external verilog?

So the extern keyword specifies the path to the verilog file and the Calyx compiler will attempt to link that path in. The compiler looks for imports in all the libraries specified by -l and attempts to look for verilog files only relative to the Calyx file. The logic for this is here: https://github.com/cucapra/calyx/blob/master/calyx/src/frontend/workspace.rs#L137

We can revisit this choice if that makes certain things easier for you.

Jul 08 '22 15:07 rachitnigam

@andrewb1999 if its okay with you, I'm going to close this issue for now. We've added a new primitive for sequential reads (#1145) and your compiler can generate new memories using the external stuff

Sep 14 '22 21:09 rachitnigam

calyx calyx copied to clipboard

Add more memory primitives

calyx
calyx copied to clipboard