calyx icon indicating copy to clipboard operation
calyx copied to clipboard

Reducing control fan-out

Open rachitnigam opened this issue 4 years ago • 5 comments
trafficstars

Control signals in Calyx programs often have high fan outs. For example, in a par statement with n children, the go signal to the corresponding group to implement the control will have a fan-out of n.

With the ntt pipelines that execute a lot of simple groups in the same par block (and even larger systolic arrays), this will quickly become a problem.

A possible solution is trading off latency for fan-out by inserting registers to forward the control signals:

  1. Given a par block with n children, instantiate two control registers and connect pars go signal to the register.
  2. Partition the children into two groups of n/2 control statements. Each half of control statements get their go signal from one of the two registers.

This slows down the par block by one cycle but reduces fan-out by a factor of two. In general, given a maximal fan-out of m (specified by attribute, target, or compiler flag), this pass can use log_m n more cycles to break up the control flow signal.

This is partly inspired by conversation with @zhangzhiru on control pipelining. The problem is harder because they need to forward the signal within the context of pipelines. However, I think this could be a good base of giving frontend or compiler toolchains a way to guarantee synthesizability of Calyx designs.

rachitnigam avatar Feb 01 '21 17:02 rachitnigam

This is a good idea IMO. Just a couple of disconnected thoughts:

  • Add this to the list (with resource sharing & register minimization) of passes that ideally want some sort of technology-specific cost model as a heuristic guide. Especially if the "register tree" involved needs a configurable width & depth.
  • Like some of those other passes, it would be nice to have a way to assess the need for it by measuring something about the unoptimized design. Are there convenient ways to find bad fan-outs in a netlist and to blame synthesis/timing failures on them? (I don't know the answer to this.)

sampsyo avatar Feb 02 '21 01:02 sampsyo

Experiment to see if fan-out problems can be fixed using this:

  • Write generator for generating high fanout programs (with some variable to control fannout)
  • run synthesis until this fails timing (with increasing variables)
  • see if increasing nesting fixes this

We also realized that we don't need a whole new pass to do this. We can just disable/undo the effect of collapse-control and let the compilation for par { par { ... }; par { ... } } generate the additional structure.

rachitnigam avatar Feb 04 '21 02:02 rachitnigam