coreir icon indicating copy to clipboard operation
coreir copied to clipboard

CoreIR clock gated registers don't get inferred by synthesis tools

Open hofstee opened this issue 6 years ago • 9 comments

When using the clock-gated registers output by CoreIR, the synthesis tools do not recognize this and put a correctly clock-gated register cell in the design. @mbstrange2 has more details.

In the memory tile this is something like a 40% reduction in power when using properly clock gated registers. Right now 80% of the power in Garnet is related to the clock and this is a large reason why.

hofstee avatar Oct 05 '19 03:10 hofstee

Since coreir does a logical clock gating (using the CE to mux between the register's current value and the update value), it won't get inferred as a real clock gate that would be inferred by the always@ description: always @(posedge clk) begin if(CE) reg <= new_value; end

I simply replaced all Register_.... modules with the always descriptions, so it's a simple fix.

mbstrange2 avatar Oct 05 '19 04:10 mbstrange2

I can provide a coreir fix for this as well (basically Max's solution). Can you send me a simple example coreir file that has this issue?

rdaly525 avatar Oct 06 '19 13:10 rdaly525

fifo.v

all the Register_has_* modules have this issue

mbstrange2 avatar Oct 07 '19 16:10 mbstrange2

After looking through the maze of code from Garnet through Gemstone, Mantle, Magma, and back to CoreIR, this is what I think needs to happen:

include/coreir/definitions/coreVerilog.hpp needs to be modified to add in register primitives that have enables and clock enables. The Verilog should look something like this:

module reg(input logic clk, rst_n,
           input logic [WIDTH-1:0]  d,
           output logic [WIDTH-1:0] d);

   always @(posedge clk, negedge rst_n) begin
      if (~rst_n) begin
         q <= init;
      end else begin
         if (clk_en && en) begin
            q <= d;
         end
      end
   end

endmodule

src/ir/headers/mantle.hpp needs to be modified to remove the code that creates muxes and use the new modules instead.

From there, mantle and magma need to be updated in a similar way so mantle doesn't create a similar mux for the clk_en.

hofstee avatar Oct 10 '19 00:10 hofstee

From what I understand, the issue is that the synthesis tool won't recognize the clock gating pattern unless it's specified in a certain form (e.g. the above verilog always block style). This seems like a target specific issue, ideally in the long term we could discover the "logical" clock gating pattern (mux on the input to the register) and replace it with a specialized version that emits the above verilog code for synthesis quality (only when we're emitting verilog). It seems that logically these two descriptions are equivalent, but one has support for a "pattern matching" mechanism in the downstream tool.

however, for now I think providing a coreir primitive with this would work as temporary fix. @rdaly525 are you thinking we just add another register primitive (or change the existing one) that emits this kind of verilog (e.g. a reg with enable primitive). This could also facilitate mapping to FPGAs where the primitives typically have enables built in, so perhaps this actually does make sense to have as a built-in primitive (rather than as an optimization as I mentioned above), although that doesn't mean we can't have an optimization that transforms logical patterns into these more specialized forms.

leonardt avatar Oct 10 '19 19:10 leonardt

Right, synthesis tools are tuned for this kind of behavioral description. When you explicitly instantiate a mux and use CE as select to that mux, synthesis will synthesize exactly that, it won't infer a real clock enable. current coreir does write enable on the register, not clock enable.

mbstrange2 avatar Oct 10 '19 19:10 mbstrange2

This is definitely a target-specific issue and should be taken care of in the backend opposed to changing the CoreIR primitives.

A good quick solution should be to define a mantle register module (declared in mantle.hpp) that has the specific interface with the CE. Then have a separate target-specific verilog definition for that mantle module with the appropriate always block. This is basically the same way we dealt with the CW floating point verilog.

rdaly525 avatar Oct 10 '19 23:10 rdaly525

Thanks @norabarlow. One questions. If we do this soon, will it help with your paper? If so, we should bump the priority.

phanrahan avatar Nov 08 '19 20:11 phanrahan

It would be helpful, but not absolutely necessary. We have written a script to add in clock gating in the generated Verilog (not a perfect solution, but it performs the functionality we want).

norabarlow avatar Nov 08 '19 21:11 norabarlow