calyx
calyx copied to clipboard
Combinational loops created when `invoke`d component has single group in control sequence
I think the title describes what is happening, but I'm not sure. Any input people have is appreciated!
I’ve tried to pin down where some combinational loops are being generated when using invokes. I think it’s a symptom of the invoke getting “hoisted” in a way such that the invoked component’s done signal relies on an input signal, after hoisting.
It seems like a main with a group that writes to a “concrete” register, followed by an invoke to a sub-component that writes to that same register, as a ref, creates a combinational loop. Something like this:
component invokee() -> () {
cells {
ref ref_reg = std_reg(32);
}
wires {
//writes to the ref register
group invokee_ref_reg_write {
ref_reg.in = 32'd0;
ref_reg.write_en = 1'd1;
invokee_ref_reg_write[done] = ref_reg.done;
}
}
control {
invokee_ref_reg_write;
}
}
component wrapper<"toplevel"=1>() -> () {
cells {
concrete_reg = std_reg(32);
invokee = invokee();
}
wires {
//writes to the concrete register
group invoker_reg_write {
concrete_reg.in = 32'd0;
concrete_reg.write_en = 1'd1;
invoker_reg_write[done] = concrete_reg.done;
}
}
control {
invoker_reg_write;
invoke invokee[ref_reg=concrete_reg]()();
}
}
Looking at the the flattened version of the above program, I think this is a symptom of #621. The empty control block of invokee, and the combinational dependence of invokee.done on an input to the component, I think is what @rachitnigam describes in this comment coming into play.
In terms of an actionable way forward, one approach could modify the compile_invoke pass to flatten such components that only have a single group in their control block into the component doing the invoking itself? So in the flattened example, we would remove invokee as a component and have invokee's sole group invokee_ref_reg_write live in the wires block of wrapper. In this way we would avoid the combinational-groups that get produced with the current pass.
Happy to hear any thoughts about other possible ways forward.
In case it's of interest, the verilog produced can be found here
And the error message verilator produces is
%Warning-UNOPTFLAT: dumpvars.v:380:7: Signal unoptimizable: Circular combinational logic: 'wrapper.invokee_ref_reg_done'
380 | logic invokee_ref_reg_done;
| ^~~~~~~~~~~~~~~~~~~~
... For warning description see https://verilator.org/warn/UNOPTFLAT?v=5.006
... Use "/* verilator lint_off UNOPTFLAT */" and lint_on around source to disable this message.
dumpvars.v:380:7: Example path: wrapper.invokee_ref_reg_done
dumpvars.v:532:22: Example path: ASSIGNW
dumpvars.v:391:7: Example path: wrapper.invoke1_go_in
dumpvars.v:517:29: Example path: ASSIGNW
dumpvars.v:380:7: Example path: wrapper.invokee_ref_reg_done
dot -Tpdf -o ~/a.pdf sim_build//Vwrapper_059_unoptflat.dot
%Error: Exiting due to 1 warning(s)
Sorry if this is a dumb question, but what is the easiest way to reproduce this error?
As in, what command should I run?
Not dumb! If the example above is named minimized.futil then
fud2 minimized.futil --from calyx --to verilog-noverify -o minimized-verilog.v
then
verilator -cc --exe -Mdir sim_build/ -DCOCOTB_SIM=1 --top-module wrapper --timing --timescale 1ns/1ps ./minimized-verilog.v
should reproduce.
FWIW, while I continue to dig into this a bit, here's a one-liner for detecting the problem:
verilator --lint-only --top-module wrapper =(fud2 minimized.futil --to verilog)
=(...) is a bash-ism that puts a command's stdout into a temporary file and passes along that filename.
This one's a doozy!! Thanks for the compact reproduction.
understanding the problem
I tried to pull out the core of the problem in the lowered Calyx. The offending lines are:
In wrapper:
invokee.go = invoke0_go.out ? 1'd1;
invokee.ref_reg_done = invoke0_go.out ? concrete_reg.done;
invoke0_go.in = !invoke0_done.out & fsm.out == 2'd1 & tdcc_go.out ? 1'd1;
invoke0_done.in = invokee.done;
In invokee:
done = invokee_ref_reg_write_done.out ? 1'd1;
invokee_ref_reg_write_done.in = ref_reg_done;
invokee_ref_reg_write_go.in = go;
Simplifying all the std_wires and stuff down, the combinational cycle looks sorta like this:
invokee.ref_reg_done = !invokee.done & (some FSM stuff) ? concrete_reg.done;
invokee.done = invokee.ref_reg_done;
That is, the compiler is trying to make sure ref_reg_done (like all the assignments involved in the invocation) only gets assigned when the invocation is currently ongoing, i.e., the invokee is not yet done. But the invocation wants to determine whether or not it is done using ref_reg_done. Hence the cycle.
Another way of seeing the problem is to look at the result of compile-invoke, which generates this group:
group invoke0 {
concrete_reg.in = invokee.ref_reg_in;
concrete_reg.write_en = invokee.ref_reg_write_en;
invokee.ref_reg_out = concrete_reg.out;
invokee.ref_reg_done = concrete_reg.done;
invokee.go = 1'd1;
invoke0[done] = invokee.done;
}
These lines cause the problem:
invokee.ref_reg_done = concrete_reg.done;
invoke0[done] = invokee.done;
The assignment to ref_reg_done needs to be guarded on the group's doneness. But of course this assignment also determines the group's doneness. The compiler can't know that only by looking at wrapper; divining this dependency requires a whole-program analysis.
potential solutions
Way back when, in https://github.com/calyxir/calyx/issues/621#issuecomment-1118984403, @rachitnigam distilled one option:
Another ponderous solution to this problem, or rather a guarantee that should possibly be provided by the compiler, is that there is no combinational path between a component's
goanddonesignals.
Which we would have to generalize (disallow combinational paths from any input to the done port). That would do it; the only problem is that this perfectly innocuous-looking ref generates a component that does just that (i.e., creates a combinational path from an input to done where one did not otherwise exist).
While this requirement seems awkward, maybe it is necessary to make it feasible to implement invokes correctly. We would then have the problem of figuring out what to do with ref cells that are wired up to done ports… disallowing them seems pretty annoying, but maybe we can do something better, such as marking the relevant input port to say "hey, this one is actually my done signal."
And @nathanielnrn suggests:
In terms of an actionable way forward, one approach could modify the
[compile_invokepass](https://github.com/calyxir/calyx/blob/ed4ce6e5980dfb61a289354231dd514105248c7e/calyx-opt/src/passes/compile_invoke.rs) to flatten such components that only have a single group in their control block into the component doing the invoking itself?
This would amount to inlining the invoke. Which is a perfectly reasonable thing to do, but it feels kinda bad to require inlining for correctness. Deciding whether or not to inline based on the control program is also a little problematic: it seems to imply that there is an underlying criterion that makes a given component require inlining for correctness, but that we can only detect this property if it has not had its control compiled away yet. So we would still have problems with components that behave the same way but are fully structural (e.g., the result of TDCC'ing a single-statement component).
an aside
As an aside, it would be very handy to have (but probably not worth it to build) a tool that can detect combinational cycles at the Calyx level, just for debugging compiler bugs, without emitting Verilog...
In terms of an actionable way forward, one approach could modify the compile_invoke pass to flatten such components that only have a single group in their control block into the component doing the invoking itself?
Another possible approach I've thought of in the past is making sure that every component gets an FSM, even if there is exactly one group in it. This would allow us to redirect the go and done signals and make sure that we do not get combinational loops.
For assignments of this form:
invokee.ref_reg_done = concrete_reg.done;
There is some sort of "done-forwarding" being performed that should not be treated as a normal assignment. This is possibly related to #1034 where we have a similar problem with clk and reset signals needing special-cased forwarding compared to normal data signals.
I think there is some higher-level problem/question about how to think about control vs. data signals (which keeps coming up) and something that we should address with a possible redesign.
=(fud2 minimized.futil --to verilog)
I did not know this! Super useful!
Having thought about this a bit more, I think it can be useful to have a component's/group's done signal be driven by an input. For example an AXI write channel can be considered done when WLAST is high (To be precise, it would actually be done in the cycle after WLAST is high, but I think this gets the point across). So I think that
making sure that every component gets an FSM, even if there is exactly one group in it
might be the right way forward.
Thanks for the discussion. To distill the option space, one way to think about it is that we need to respect the Calyx requirement that every (non-combinational) component takes at least 1 cycle to run. I think there are two possible directions:
- prohibit, distinguish, or specially handle component inputs that are wired up to their
@donesignal - compile components like the one in this example so that, even when an input is wired up to
@done, they still take one cycle to run (i.e., they ignore the@done-wired input until the cycle after their@gogoes high)
I think "making sure every component gets an FSM" amounts to a pretty good approach to the second path.