firrtl
firrtl copied to clipboard
Collapse Vectors of Single Bits to Multi-bit UInts
This adds a CollapseVectors transform that converts UInt<1>[n] into UInt<n>. This works on ports, wires, registers, nodes, and memories. It will not collapse top-level ports.
Roughly, this is running the following algorithm:
- Any declaration (wire, register, node, memory) is converted from
UInt<1>[n]toUInt<n> - Anytime a bit in a collapsed vector is assigned to (
ConnectorIsInvalid), this statement is dropped and the RHS is stored in a bit assignment data structure - When the last bit in a collapsed vector is assigned to, the full assignment is done in one of four possible ways with the following priority:
- If every bit is invalidated, then the collapsed vector (now a
UInt) is invalidated - If every bit is assigned to a literal, then the collapsed vector gets assigned to the collapsed literal
- If every bit is assigned (in order) to every bit in another collapsed vector, then the two collapsed vectors are connected directly
- Temporary wires/nodes are generated for every bit, each bit is assigned to, and the collapsed vector is assigned to via a concatenation.
- If every bit is invalidated, then the collapsed vector (now a
- Any RHS references to subindices are replaced with bit extract primops.
Async Reset Problems
This fails asynchronous reset checks due to the way that collapsed vectors are bit-assigned to.
Specifically, when collapsing a complex literal of another complex literal, this fails as the assignment is handled with a concatenation.
This likely motivates allowing resets to propagate across cat, bits, and pad. I.e., this needs the reduced version of https://github.com/freechipsproject/firrtl/pull/1201#issuecomment-549638878.
As an example, take the following circuit from AsyncResetSpec which has a register initialized to a complex literal composed of other complex literals:
circuit Test:
module Test:
input clock : Clock
input reset : AsyncReset
input x : UInt<1>[4]
output z : UInt<1>[4]
wire literal : UInt<1>[2]
literal[0] <= UInt<1>("h01")
literal[1] <= UInt<1>("h01")
wire complex_literal : UInt<1>[4]
complex_literal[0] <= literal[0]
complex_literal[1] <= literal[1]
complex_literal[2] <= UInt<1>("h00")
complex_literal[3] <= UInt<1>("h00")
reg r : UInt<1>[4], clock with : (reset => (reset, complex_literal))
r <= x
z <= r
This produces the following which fails CheckResets due to the concatenation:
circuit Test :
module Test :
input clock : Clock
input reset : AsyncReset
input x : UInt<1>[4]
output z : UInt<1>[4]stOnly 4s
wire bundle : { a : UInt<1>, b : UInt<1>}
wire complex_literal : UInt<4>
reg r : UInt<4>, clock with :
reset => (reset, complex_literal)
z[0] <= bits(r, 0, 0)
z[1] <= bits(r, 1, 1)
z[2] <= bits(r, 2, 2)
z[3] <= bits(r, 3, 3)
bundle.a <= UInt<1>("h1")
bundle.b <= UInt<1>("h1")
node _complex_literal_0 = bundle.a
node _complex_literal_1 = bundle.b
node _complex_literal_2 = UInt<1>("h0")
node _complex_literal_3 = UInt<1>("h0")
complex_literal <= cat(cat(_complex_literal_3, _complex_literal_2), cat(_complex_literal_1, _complex_literal_0))
node _r_0 = x[0]
node _r_1 = x[1]
node _r_2 = x[2]
node _r_3 = x[3]
r <= cat(cat(_r_3, _r_2), cat(_r_1, _r_0))
Todo
- [ ] Relax constraints on asynchronous reset checks
- [ ] Support
DontTouchAnnotation - [ ] Don't collapse
ExtModuleports
Metadata
Fixes #99
Performance
This should be O(n) as it's walking the AST twice (once to build an instance graph and once to process it).
Running this on Rocket Chip master is "not too bad":
======== Starting Transform CollapseVectors ========
----------------------------------------------------
Time: 790.0 ms
Form: UnknownForm
======== Finished Transform CollapseVectors ========
QOR
Here's a small example of what this is doing in Rocket Chip:
module IntXbar :
input clock : Clock
input reset : UInt<1>
output auto : {flip int_in : UInt<1>[2], int_out : UInt<1>[2]}
clock is invalid
reset is invalid
auto is invalid
wire _T : UInt<1>[2] @[Nodes.scala 369:76]
_T is invalid @[Nodes.scala 369:76]
wire _T_1 : UInt<1>[2] @[Nodes.scala 368:76]
_T_1 is invalid @[Nodes.scala 368:76]
auto.int_out <- _T_1 @[LazyModule.scala 173:49]
_T <- auto.int_in @[LazyModule.scala 173:31]
_T_1[0] <= _T[0] @[Xbar.scala 21:44]
_T_1[1] <= _T[1] @[Xbar.scala 21:44]
Without this PR, you get:
module IntXbar( // @[:[email protected]]
input auto_int_in_0, // @[:[email protected]]
input auto_int_in_1, // @[:[email protected]]
output auto_int_out_0, // @[:[email protected]]
output auto_int_out_1 // @[:[email protected]]
);
assign auto_int_out_0 = auto_int_in_0; // @[LazyModule.scala 173:49:[email protected]]
assign auto_int_out_1 = auto_int_in_1; // @[LazyModule.scala 173:49:[email protected]]
endmodule
With this PR you get:
module IntXbar( // @[:[email protected]]
input [1:0] auto_int_in, // @[:[email protected]]
output [1:0] auto_int_out // @[:[email protected]]
);
assign auto_int_out = auto_int_in;
endmodule
Overall Verilog line count changes are about flat. No discernible change in Verilator simulation time.
# git diff --no-index --stat freechips.rocketchip.system.DefaultConfig.orig.v freechips.rocketchip.system.DefaultConfig.v
... => freechips.rocketchip.system.DefaultConfig.v | 31658 +++++++++----------
1 file changed, 15608 insertions(+), 16050 deletions(-)
From the dev meeting: We can see if simply relaxing CheckResets will work. However, this should be tested that it works with both -X verilog and -X mverilog.
I would like this to work on top level ports. A specific use case is to hool up a Vec(n, Bool()) byteenable signal to Avalon MM, in which case Avalon MM expects the vectpr of bools as a single signal.
Data point: I switched from Vec(n, Bool()) to UInt() for byte enable write masks in my design and it reduced Verilog files I generated from 250000 lines to 200000 lines.