calyx First Class FSMs

First Class FSMs

Open parthsarkar17 opened this issue 4 months ago • 12 comments

Idea

Here's the initial discussion on Zulip, which touches on the motivation for this.

The idea is to be able to instantiate FSMs as a group-like construct in the wires section of a Calyx program. In particular, these FSM constructs might let users specify:

Their own symbolic states
(conditional) Transitions from one state to another (in a case-like block)
(conditional) Assignments that are guarded by the current state
Groups that trigger at the current state

Motivation

As the original post mentions, this was largely motivated by the idea that the Vivado synthesis tool is able to specifically infer a finite state machine construct in Verilog when FSMs are compiled to a Verilog case block. Here's what our FSM transition assignment looked like before:

assign fsm2_in =_guard485 ? 3'd6 : _guard490 ? 3'd5 : _guard497 ? 3'd2 : ... 3'd0

and here's roughly what Vivado expects in order to infer an FSM;

always @ (*) begin
    case (fsm2_out)
       ...
       3'd2 : begin
            if (raise_err_done_out) begin
                fsm2_in = 3'd6;
            end
            else begin
                fsm2_in = 3'd2;
            end
        end
        3'd3 : begin
            if (read_payload_from_mem_pop_done_out) begin
                fsm2_in = 3'd4;
            end
            else begin
                fsm2_in = 3'd3;
            end
        end
      ...

We want Vivado to recognize our register as an FSM because it can do some cool optimizations on it to reduce resources and potentially increase frequency. We're guessing the reason Vivado prefers the second construction is because it can list all transitions, guaranteeing a "finite" number of states, which wasn't necessarily clear from the original transitions specified by the assign.

So, if we want to emit these case statements via the Verilog backend, it would be nice to have an IR construct within the compiler to store these transitions. This prompted the idea to let users define FSMs themselves.

Example

Here's a small example I was working with to map out what a hand-designed FSM in Calyx might look like. It's a silly multiplier for positive integers that adds an input a_in to a result accumulator b_in times, but it importantly contains a loop and conditional transitions. The control looks like this:

control {
    loada;
    loadb;
    while gt.out with cond {
      resincr;
      bdecr;
    }
  }

which currently compiles to 7 states. One for each of loada ( s0 ) and loadb ( s1 ), one for the generated group that updates the comb_reg containing whether b.out > 0 ( s2 ), one for each of updating the accumulator ( s3 ) and b registers ( s4 ), another to once again update comb_reg ( s5 ), and finally the done state ( s6 ).

Here's how I was imagining the equivalent with a new fsm construct, stealing inspiration from what @sampsyo posted on the Zulip thread. For this example, I decided to have only one state in which comb_reg is updated. Also, while I've listed states as integers, I'm thinking of them as symbolic states.

Within the wires section, we might have something like:

fsm myfsm {
    0: { loada; } => <*> 1, 
    1: { loadb; } => 2,
    2: { comb_cond_update; } => {
            comb_reg.out: 3, <**>
           !comb_reg.out: 5,
        },
    3: { resincr; } => 4,
    4: { bdecr; } => 2,
    5: => 0
}
* implicit loada.done guard here, plus implicit self-loop if any transition condition not met
** again, implicit guard of comb_cond_update.done, with self-loop

Note that user-defined transition conditions should be exclusive, so the FSM remains deterministic. At least in this example, it seems that we can more specifically expect cond1 xor cond2 xor ... xor cond3 to be true, meaning the only condition on a self-loop will be if the group invoked in that state has a low done signal.

When we compile these fsm constructs to Verilog, we can indeed have a case statement that matches on the current state of the fsm register. Then, within each branch of this RTL case block, we would have an if/else if statement for each condition for a transition (e.g.comb_reg.out from above), which we would guard with the done signal of the group invoked in that state. Then, the else statement (whose purpose is to encode the implicit self-loop mentioned above) would only occur if the done signal of the invoked group is not high.

We would let the assignments within a group be compiled like normal, with one key difference: the only compiler-generated condition on whether an assignment within a group can activate is the state of the FSM from which the group is invoked. Here's why that's different from how we currently do it, and why we think this modification maintains correctness:

Currently, a group with a control block has its assignments is guarded by 1) the state of the FSM fsm.out and 2) tdcc.go. Further, the number of guards increases with nested control that offloads FSMs because you'd instantiate a tdcc group for each new FSM. Based on some discussions during the calyx-opt meeting, the checks of tdcc{x}.go seem unnecessary: if an FSM is currently in the middle of execution, there is never a point at which tdcc.go becomes low. It might offload to another FSM, but the tdcc.go signal remains high. Based on this idea, with nested control and offloaded FSMs, we need only check the "leaf" FSM state, and none of the "node" (i.e. parent) FSM states and none of the tdcc{x}.go signals.

Now, our new fsm construct idea doesn't use tdcc-like generated groups. But, we are thinking of allowing one FSM to call another FSM, similar to a group invoke. We think we can take advantage of this to dramatically reduce assignment guards, and thereby reduce LUT usage!

Brief Static FSM Interlude

I agree with the idea that static fsm constructs should be distinct from dynamic ones. Given that assumption, we could create a new fsm construct for static FSMs, where implicit group.done guards are not added to transition conditions. Since each group will have a latency attached to it, we could compile our static FSMs the same way, where each state is a cycle.

For example,

staticfsm my_fsm {
    s0 : { < 3 cycle group > } => s1,
    s1 : { < 1 cycle group > } => s2,
    s2 : ... => ...
}

would compile to:

always @ (*) begin
    case (fsm_out)
       4'd0 : fsm_in = 4'd1; // s0 begins here
       4'd1 : fsm_in = 4'd2;
       4'd2 : fsm_in = 4'd3;
       4'd3 : fsm_in = 4'd4 // s1 begins here
       4'd4 : fsm_in = 3'd5 // s2 begins here
       ...

Static repeats are the interesting bit. We surely wouldn't want to make users create distinct states for the same loop body, so maybe there can be a repeat construct within the staticfsm block somehow?

Open Questions

Do we allow arbitrary assignments instead of just group/fsm invokes?
- One issue might be semantics around implicit group.done guards in the transitions, since, now, it may not be necessary to invoke a group at a state. Static FSMs might get around this by having a different fsm construct specific to static FSMs. Should we do the same and have an "assignment FSM" construct?
How do we allow specification for symbolic representations of state?
Right now, my idea of a user-defined FSM is just "execute this group at this state". What if users have a different idea of what an FSM means?
- I'm thinking of an IDLE, CALC, DONE state situation, where control signals are accepted and emitted at every cycle. This probably goes back to allowing arbitrary assignments.

Oct 05 '24 18:10 parthsarkar17

calyx calyx copied to clipboard

First Class FSMs

Idea

Motivation

Example

Brief Static FSM Interlude

Open Questions

calyx
calyx copied to clipboard