firrtl icon indicating copy to clipboard operation
firrtl copied to clipboard

Collapse Vectors of Single Bits to Multi-bit UInts

Open seldridge opened this issue 5 years ago • 3 comments

This adds a CollapseVectors transform that converts UInt<1>[n] into UInt<n>. This works on ports, wires, registers, nodes, and memories. It will not collapse top-level ports.

Roughly, this is running the following algorithm:

  1. Any declaration (wire, register, node, memory) is converted from UInt<1>[n] to UInt<n>
  2. Anytime a bit in a collapsed vector is assigned to (Connect or IsInvalid), this statement is dropped and the RHS is stored in a bit assignment data structure
  3. When the last bit in a collapsed vector is assigned to, the full assignment is done in one of four possible ways with the following priority:
    1. If every bit is invalidated, then the collapsed vector (now a UInt) is invalidated
    2. If every bit is assigned to a literal, then the collapsed vector gets assigned to the collapsed literal
    3. If every bit is assigned (in order) to every bit in another collapsed vector, then the two collapsed vectors are connected directly
    4. Temporary wires/nodes are generated for every bit, each bit is assigned to, and the collapsed vector is assigned to via a concatenation.
  4. Any RHS references to subindices are replaced with bit extract primops.

Async Reset Problems

This fails asynchronous reset checks due to the way that collapsed vectors are bit-assigned to.

Specifically, when collapsing a complex literal of another complex literal, this fails as the assignment is handled with a concatenation.

This likely motivates allowing resets to propagate across cat, bits, and pad. I.e., this needs the reduced version of https://github.com/freechipsproject/firrtl/pull/1201#issuecomment-549638878.

As an example, take the following circuit from AsyncResetSpec which has a register initialized to a complex literal composed of other complex literals:

circuit Test:
  module Test:
    input clock : Clock
    input reset : AsyncReset
    input x : UInt<1>[4]
    output z : UInt<1>[4]
    wire literal : UInt<1>[2]
    literal[0] <= UInt<1>("h01")
    literal[1] <= UInt<1>("h01")
    wire complex_literal : UInt<1>[4]
    complex_literal[0] <= literal[0]
    complex_literal[1] <= literal[1]
    complex_literal[2] <= UInt<1>("h00")
    complex_literal[3] <= UInt<1>("h00")
    reg r : UInt<1>[4], clock with : (reset => (reset, complex_literal))
    r <= x
    z <= r

This produces the following which fails CheckResets due to the concatenation:

circuit Test :
  module Test :
    input clock : Clock
    input reset : AsyncReset
    input x : UInt<1>[4]
    output z : UInt<1>[4]stOnly 4s
  
    wire bundle : { a : UInt<1>, b : UInt<1>}
    wire complex_literal : UInt<4>
    reg r : UInt<4>, clock with :
      reset => (reset, complex_literal)
    z[0] <= bits(r, 0, 0)
    z[1] <= bits(r, 1, 1)
    z[2] <= bits(r, 2, 2)
    z[3] <= bits(r, 3, 3)
    bundle.a <= UInt<1>("h1")
    bundle.b <= UInt<1>("h1")
    node _complex_literal_0 = bundle.a
    node _complex_literal_1 = bundle.b
    node _complex_literal_2 = UInt<1>("h0")
    node _complex_literal_3 = UInt<1>("h0")
    complex_literal <= cat(cat(_complex_literal_3, _complex_literal_2), cat(_complex_literal_1, _complex_literal_0))
    node _r_0 = x[0]
    node _r_1 = x[1]
    node _r_2 = x[2]
    node _r_3 = x[3]
    r <= cat(cat(_r_3, _r_2), cat(_r_1, _r_0))

Todo

  • [ ] Relax constraints on asynchronous reset checks
  • [ ] Support DontTouchAnnotation
  • [ ] Don't collapse ExtModule ports

Metadata

Fixes #99

Performance

This should be O(n) as it's walking the AST twice (once to build an instance graph and once to process it).

Running this on Rocket Chip master is "not too bad":

======== Starting Transform CollapseVectors ========
----------------------------------------------------

Time: 790.0 ms
Form: UnknownForm
======== Finished Transform CollapseVectors ========

QOR

Here's a small example of what this is doing in Rocket Chip:

  module IntXbar : 
    input clock : Clock
    input reset : UInt<1>
    output auto : {flip int_in : UInt<1>[2], int_out : UInt<1>[2]}
    
    clock is invalid
    reset is invalid
    auto is invalid
    wire _T : UInt<1>[2] @[Nodes.scala 369:76]
    _T is invalid @[Nodes.scala 369:76]
    wire _T_1 : UInt<1>[2] @[Nodes.scala 368:76]
    _T_1 is invalid @[Nodes.scala 368:76]
    auto.int_out <- _T_1 @[LazyModule.scala 173:49]
    _T <- auto.int_in @[LazyModule.scala 173:31]
    _T_1[0] <= _T[0] @[Xbar.scala 21:44]
    _T_1[1] <= _T[1] @[Xbar.scala 21:44]

Without this PR, you get:

module IntXbar( // @[:[email protected]]
  input   auto_int_in_0, // @[:[email protected]]
  input   auto_int_in_1, // @[:[email protected]]
  output  auto_int_out_0, // @[:[email protected]]
  output  auto_int_out_1 // @[:[email protected]]
 );
  assign auto_int_out_0 = auto_int_in_0; // @[LazyModule.scala 173:49:[email protected]]
  assign auto_int_out_1 = auto_int_in_1; // @[LazyModule.scala 173:49:[email protected]]
endmodule

With this PR you get:

module IntXbar( // @[:[email protected]]                                                                                                                                                                                           
  input  [1:0] auto_int_in, // @[:[email protected]]                                                                                                                                                                               
  output [1:0] auto_int_out // @[:[email protected]]                                                                                                                                                                               
);                                                                                                                                                                                                                                                                 
  assign auto_int_out = auto_int_in;                                                                                                                                                                                                                               
endmodule 

Overall Verilog line count changes are about flat. No discernible change in Verilator simulation time.

# git diff --no-index --stat freechips.rocketchip.system.DefaultConfig.orig.v freechips.rocketchip.system.DefaultConfig.v
 ... => freechips.rocketchip.system.DefaultConfig.v | 31658 +++++++++----------
 1 file changed, 15608 insertions(+), 16050 deletions(-)

seldridge avatar Nov 01 '19 03:11 seldridge

From the dev meeting: We can see if simply relaxing CheckResets will work. However, this should be tested that it works with both -X verilog and -X mverilog.

seldridge avatar Nov 18 '19 21:11 seldridge

I would like this to work on top level ports. A specific use case is to hool up a Vec(n, Bool()) byteenable signal to Avalon MM, in which case Avalon MM expects the vectpr of bools as a single signal.

oharboe avatar Jan 04 '20 11:01 oharboe

Data point: I switched from Vec(n, Bool()) to UInt() for byte enable write masks in my design and it reduced Verilog files I generated from 250000 lines to 200000 lines.

oharboe avatar Jan 04 '20 17:01 oharboe