coreir
coreir copied to clipboard
CoreIR clock gated registers don't get inferred by synthesis tools
When using the clock-gated registers output by CoreIR, the synthesis tools do not recognize this and put a correctly clock-gated register cell in the design. @mbstrange2 has more details.
In the memory tile this is something like a 40% reduction in power when using properly clock gated registers. Right now 80% of the power in Garnet is related to the clock and this is a large reason why.
Since coreir does a logical clock gating (using the CE to mux between the register's current value and the update value), it won't get inferred as a real clock gate that would be inferred by the always@ description:
always @(posedge clk) begin if(CE) reg <= new_value; end
I simply replaced all Register_.... modules with the always descriptions, so it's a simple fix.
I can provide a coreir fix for this as well (basically Max's solution). Can you send me a simple example coreir file that has this issue?
After looking through the maze of code from Garnet through Gemstone, Mantle, Magma, and back to CoreIR, this is what I think needs to happen:
include/coreir/definitions/coreVerilog.hpp needs to be modified to add in register primitives that have enables and clock enables. The Verilog should look something like this:
module reg(input logic clk, rst_n,
input logic [WIDTH-1:0] d,
output logic [WIDTH-1:0] d);
always @(posedge clk, negedge rst_n) begin
if (~rst_n) begin
q <= init;
end else begin
if (clk_en && en) begin
q <= d;
end
end
end
endmodule
src/ir/headers/mantle.hpp needs to be modified to remove the code that creates muxes and use the new modules instead.
From there, mantle and magma need to be updated in a similar way so mantle doesn't create a similar mux for the clk_en.
From what I understand, the issue is that the synthesis tool won't recognize the clock gating pattern unless it's specified in a certain form (e.g. the above verilog always block style). This seems like a target specific issue, ideally in the long term we could discover the "logical" clock gating pattern (mux on the input to the register) and replace it with a specialized version that emits the above verilog code for synthesis quality (only when we're emitting verilog). It seems that logically these two descriptions are equivalent, but one has support for a "pattern matching" mechanism in the downstream tool.
however, for now I think providing a coreir primitive with this would work as temporary fix. @rdaly525 are you thinking we just add another register primitive (or change the existing one) that emits this kind of verilog (e.g. a reg with enable primitive). This could also facilitate mapping to FPGAs where the primitives typically have enables built in, so perhaps this actually does make sense to have as a built-in primitive (rather than as an optimization as I mentioned above), although that doesn't mean we can't have an optimization that transforms logical patterns into these more specialized forms.
Right, synthesis tools are tuned for this kind of behavioral description. When you explicitly instantiate a mux and use CE as select to that mux, synthesis will synthesize exactly that, it won't infer a real clock enable. current coreir does write enable on the register, not clock enable.
This is definitely a target-specific issue and should be taken care of in the backend opposed to changing the CoreIR primitives.
A good quick solution should be to define a mantle register module (declared in mantle.hpp) that has the specific interface with the CE. Then have a separate target-specific verilog definition for that mantle module with the appropriate always block. This is basically the same way we dealt with the CW floating point verilog.
Thanks @norabarlow. One questions. If we do this soon, will it help with your paper? If so, we should bump the priority.
It would be helpful, but not absolutely necessary. We have written a script to add in clock gating in the generated Verilog (not a perfect solution, but it performs the functionality we want).