calyx icon indicating copy to clipboard operation
calyx copied to clipboard

Smart seq split

Open parthsarkar17 opened this issue 7 months ago • 6 comments

Similar to duplication, the goal with splitting a seq block is ultimately to reduce fanout from the one register that normally controls a sequential schedule. Initially, we tried to transform:

seq { A; B; C; D; E; F; }

into:

seq { 
    @new_fsm seq {A; B; C;}
    @new_fsm seq {D; E; F;}
}

The children schedules generated control registers like so:

@generated fsm0 = std_reg(2);
@generated fsm1 = std_reg(2);

which was the goal, except, with this, the following groups and assignments would be generated:

group tdcc0 {
    A[go] = ...
    ...
    fsm0.in = fsm0.out == 2'd0 & A[done] ? 2'd1; // line 1
    ...
}
group tdcc1 {
    D[go] = ...
    ...
    D[go] = fsm1.in = fsm0.out == 2'd0 & D[done] ? 2'd1; // line 2
    ...
}

Lines 1 and 2 are pretty similar; they each check the current state of the FSM register and ensure some other group has finished, and then they each update their register's value with a new value (the same value for each register!). Once we got synthesis results from this "new-fsm-insertion" method, we saw that WS decreased and LUT usage increased; we suspected it had something to do with the fact that we were duplicating the logic to transition FSM states, since both fsm0 and fsm1 have identical transition conditions (up to the names of the groups they are controlling) and new values.

So, we decided to open up an option in TDCC that lets a seq block be controlled by a parent register, a child register, and duplicated versions of each register (that agree with their respective original at each cycle). We can hopefully, therefore, get the benefits of reducing fan-out, while also ensuring that we reuse logic wherever we can. Here's what the tdcc group will look like for the above control block:

group tdcc {
    A[go] = !A[done] & parent0 == 1'd0 & fsm0 == 2'd0;

    ...  // notice how the enable queries are split among the two sets of registers. that's for fan-out reduction

    F[g] = !F[done] & parent1 = 1'd1 & fsm1 = 2'd2;
    
    ... // notice how transition logic is shared and not duplicated

    fsm0.in = A[done] & parent0 == 1'd0 & fsm0 == 2'd0 ? 2'd1;
    fsm1.in = A[done] & parent0 == 1'd0 & fsm0 == 2'd0 ? 2'd1;
   
   ... // transition logic for parents isn't shown
}

In short, the idea is duplication, but with an emphasis on making sure the registers reuse logic to update themselves. Benchmarking + synthesis results are in progress.

parthsarkar17 avatar Jul 22 '24 20:07 parthsarkar17