calyx
calyx copied to clipboard
First Class FSMs
Idea
Here's the initial discussion on Zulip, which touches on the motivation for this.
The idea is to be able to instantiate FSMs as a group
-like construct in the wires
section of a Calyx program. In particular, these FSM constructs might let users specify:
- Their own symbolic states
- (conditional) Transitions from one state to another (in a
case
-like block) - (conditional) Assignments that are guarded by the current state
- Groups that trigger at the current state
Motivation
As the original post mentions, this was largely motivated by the idea that the Vivado synthesis tool is able to specifically infer a finite state machine construct in Verilog when FSMs are compiled to a Verilog case
block. Here's what our FSM transition assignment looked like before:
assign fsm2_in =_guard485 ? 3'd6 : _guard490 ? 3'd5 : _guard497 ? 3'd2 : ... 3'd0
and here's roughly what Vivado expects in order to infer an FSM;
always @ (*) begin
case (fsm2_out)
...
3'd2 : begin
if (raise_err_done_out) begin
fsm2_in = 3'd6;
end
else begin
fsm2_in = 3'd2;
end
end
3'd3 : begin
if (read_payload_from_mem_pop_done_out) begin
fsm2_in = 3'd4;
end
else begin
fsm2_in = 3'd3;
end
end
...
We want Vivado to recognize our register as an FSM because it can do some cool optimizations on it to reduce resources and potentially increase frequency. We're guessing the reason Vivado prefers the second construction is because it can list all transitions, guaranteeing a "finite" number of states, which wasn't necessarily clear from the original transitions specified by the assign
.
So, if we want to emit these case
statements via the Verilog backend, it would be nice to have an IR construct within the compiler to store these transitions. This prompted the idea to let users define FSMs themselves.
Example
Here's a small example I was working with to map out what a hand-designed FSM in Calyx might look like. It's a silly multiplier for positive integers that adds an input a_in
to a result accumulator b_in
times, but it importantly contains a loop and conditional transitions. The control looks like this:
control {
loada;
loadb;
while gt.out with cond {
resincr;
bdecr;
}
}
which currently compiles to 7 states. One for each of loada
( s0
) and loadb
( s1
), one for the generated group that updates the comb_reg
containing whether b.out > 0
( s2
), one for each of updating the accumulator ( s3
) and b
registers ( s4
), another to once again update comb_reg
( s5
), and finally the done
state ( s6
).
Here's how I was imagining the equivalent with a new fsm
construct, stealing inspiration from what @sampsyo posted on the Zulip thread. For this example, I decided to have only one state in which comb_reg
is updated. Also, while I've listed states as integers, I'm thinking of them as symbolic states.
Within the wires
section, we might have something like:
fsm myfsm {
0: { loada; } => <*> 1,
1: { loadb; } => 2,
2: { comb_cond_update; } => {
comb_reg.out: 3, <**>
!comb_reg.out: 5,
},
3: { resincr; } => 4,
4: { bdecr; } => 2,
5: => 0
}
* implicit loada.done guard here, plus implicit self-loop if any transition condition not met
** again, implicit guard of comb_cond_update.done, with self-loop
Note that user-defined transition conditions should be exclusive, so the FSM remains deterministic. At least in this example, it seems that we can more specifically expect cond1 xor cond2 xor ... xor cond3
to be true, meaning the only condition on a self-loop will be if the group invoked in that state has a low done
signal.
When we compile these fsm
constructs to Verilog, we can indeed have a case
statement that matches on the current state of the fsm
register. Then, within each branch of this RTL case block, we would have an if/else if
statement for each condition for a transition (e.g.comb_reg.out
from above), which we would guard with the done
signal of the group invoked in that state. Then, the else statement (whose purpose is to encode the implicit self-loop mentioned above) would only occur if the done
signal of the invoked group is not high.
We would let the assignments within a group be compiled like normal, with one key difference: the only compiler-generated condition on whether an assignment within a group can activate is the state of the FSM from which the group is invoked. Here's why that's different from how we currently do it, and why we think this modification maintains correctness:
Currently, a group with a control block has its assignments is guarded by 1) the state of the FSM fsm.out
and 2) tdcc.go
. Further, the number of guards increases with nested control that offloads FSMs because you'd instantiate a tdcc
group for each new FSM. Based on some discussions during the calyx-opt
meeting, the checks of tdcc{x}.go
seem unnecessary: if an FSM is currently in the middle of execution, there is never a point at which tdcc.go
becomes low. It might offload to another FSM, but the tdcc.go
signal remains high. Based on this idea, with nested control and offloaded FSMs, we need only check the "leaf" FSM state, and none of the "node" (i.e. parent) FSM states and none of the tdcc{x}.go
signals.
Now, our new fsm
construct idea doesn't use tdcc
-like generated groups. But, we are thinking of allowing one FSM to call another FSM, similar to a group invoke. We think we can take advantage of this to dramatically reduce assignment guards, and thereby reduce LUT usage!
Brief Static FSM Interlude
I agree with the idea that static fsm
constructs should be distinct from dynamic ones. Given that assumption, we could create a new fsm
construct for static FSMs, where implicit group.done
guards are not added to transition conditions. Since each group will have a latency attached to it, we could compile our static FSMs the same way, where each state is a cycle.
For example,
staticfsm my_fsm {
s0 : { < 3 cycle group > } => s1,
s1 : { < 1 cycle group > } => s2,
s2 : ... => ...
}
would compile to:
always @ (*) begin
case (fsm_out)
4'd0 : fsm_in = 4'd1; // s0 begins here
4'd1 : fsm_in = 4'd2;
4'd2 : fsm_in = 4'd3;
4'd3 : fsm_in = 4'd4 // s1 begins here
4'd4 : fsm_in = 3'd5 // s2 begins here
...
Static repeats are the interesting bit. We surely wouldn't want to make users create distinct states for the same loop body, so maybe there can be a repeat
construct within the staticfsm
block somehow?
Open Questions
- Do we allow arbitrary assignments instead of just group/fsm invokes?
- One issue might be semantics around implicit
group.done
guards in the transitions, since, now, it may not be necessary to invoke a group at a state. Static FSMs might get around this by having a differentfsm
construct specific to static FSMs. Should we do the same and have an "assignment FSM" construct?
- One issue might be semantics around implicit
- How do we allow specification for symbolic representations of state?
- Right now, my idea of a user-defined FSM is just "execute this group at this state". What if users have a different idea of what an FSM means?
- I'm thinking of an IDLE, CALC, DONE state situation, where control signals are accepted and emitted at every cycle. This probably goes back to allowing arbitrary assignments.