calyx
calyx copied to clipboard
Annotations for Data Path Components
Another shot to answer the evergreen undefinedness of undriven signals in Calyx (https://github.com/cucapra/calyx/discussions/922). In its most reduced operational form, it asks when is it safe to provide 'x as the default value for a port instead of '0.
The core of this proposal are two new attributes: @data and @control. The @data attribute marks a cell as a data path component, and the @control attribute marks a cell as a control path component. These attributes are mutually exclusive, and a cell can only have one of them.
comb primitive std_add[WIDTH](@data a: WIDTH, @data b: WIDTH) -> (out: WIDTH);
primitive std_mult_pipe[WIDTH](@control go: 1, @data a: WIDTH, @data b: WIDTH) -> (out: WIDTH, @control done: 1);
cells {
@data add0 = std_add(32);
@control incr = std_add(32);
@data mult = std_mult_pipe(32); // mult.go must always be defined
}
An instance marked with @control requires all its ports to have '0 as the default value. An instance marked with @data additionally needs to mark some of its ports with the @data attribute to state that these can accept 'x as the default value. This is because even for data path components, certain ports like the go port might always need a non-'x value.
This proposal follows a safe-by-default approach to the problem: it is always safe to ignore the @data attributes and provide all ports with a default value of '0. The @control attribute is just a safety mechanism that can be used to catch errors early.
Conditions for @control Instances
Following the things enumerated in https://github.com/cucapra/calyx/discussions/922, here is a list of some constraints the forces an instance to be a control instance:
- If it is used in the guard of an assignment
- If it is used as the done condition of a group
- If it is used as the conditional port for
iforwhile - If it is used as the input to a
@goport such aswrite_en - If it is used as an input for another
@controlinstance
Note that if any port of an instance is used in these positions, the instance must be marked as a @control instance. Number (5) is the challenging one here since it's a recursive condition. However, I don't see any other way to address the problem of propagation of undefinedness from data path to the control path. The interesting question is whether this approach will leave any instances as @data instances.
We can design a new pass that can compute which cells are @control using the criteria above. If these conditions are necessary and sufficient, all the cells not marked with @control are @data cells.
Semantics
Everything up till now gives a very operational view of how to address the undefinedness problem. However, it doesn't give us a clear view on what the exact semantic properties are. As far as I can tell, the semantics needs to be defined in terms of these attributes. There is no a priori way to define which signals are data path and which are control path; we can only call programs undefined given an assignment of attributes to instances.
The Filament issue above points out an optimization that this can enable in the backend:
It occurs to me that this optimization should be done by Calyx and not Filament itself because this reasoning applies to all data ports. Specifically, rewriting:
in = guard ? data : 'x
into
in = data
Should always be safe
Finally catching up on this very nice idea, @rachitnigam, after your helpful explanation today!
I think the note about allowed optimizations above is especially important. It's a very cool upside that we can show that certain optimizations are allowed on always-never-undefined values that are not possible on maybe-undefined values!!
Just to jot down some high-level thoughts that came out of today's brief discussion:
- One way to "phase" this work would be to hammer things out in a fully dynamic context first, then work out the static checking. That would look like this:
- In the first phase, define the dynamic semantics of the undefined value (here denoted
undef) and the dynamic events that should be errors. For example, for error conditions: a guard expression evaluating toundefis not allowed, and it is also an error for a@goport ever to be assigned toundef. For dynamic semantics,undefaggressively taints guard expressions (e.g.,(undef < 42) == undef), and it propagates aggressively through parameters likestd_add. Hopefully, we extend the interpreter to implement these semantics precisely. - In the second phase, add
@data/@controlannotations as a way of soundly avoiding the dynamic error conditions. The theorem is basically that a well-typed (with respect to these annotations) Calyx program cannot ever encounter anundef-related error at run time.
- In the first phase, define the dynamic semantics of the undefined value (here denoted
- For the static checker, I think it would be nice to view this as an information flow type system—i.e., all types are qualified (so, like,
@data 32or@control 1). The design and goal guarantees would be pretty similar to nullability annotations, as in Java's nullability checker. - As a corollary, a fun way to approach the issue above with annotated cells would be to make cells qualifier-polymorphic. The problem here is that we want to support cells that are sometimes used as data and sometimes used as control. For example, you might want the
outport ofstd_add(32)to sometimes be@data 32and sometimes be@control 32, depending on the adder. Qualifier polymorphism would mean that you would pass in a qualifier parameter, in exactly the same way that you pass in a width. So you could declare the primitive to have all its ports have type@Q WIDTH, andQwould be a type-level variable. Then if you instantiatestd_add(32, @data)orstd_add(32, @control)to fill in thatQ. - A big question in my mind is whether we want non-undefness to be required or enforced by the compiler.
- Required: The program itself needs to guarantee that the value is never undef. The compiler doesn't need to insert any zero fallbacks. This has somewhat more predictable behavior, I'd say, since there aren't any mysterious zeroes that appear implicitly; all values that happen at run time come from the program text.
- Enforced: The compiler inserts zero-fallback assignments for all
@controlports. This is the current behavior (but we currently do this to all ports). It would be lower effort, especially on the frontends' parts, because this is pretty much what they're expecting anyway. But it's admittedly funkier in terms of semantics, because you don't get direct control over where the zeroes get injected.
A big question in my mind is whether we want non-undefness to be required or enforced by the compiler.
One important consequence of answering this question is whether the optimization can be represented in Calyx or not. If the compiler enforces this property then the optimization occurs in backend while the Calyx code is being compiled in Verilog. On the other hand, if programs are required to maintain it, then we can represent the optimization within Calyx.
A middle ground solution is following the philosophy outlined in #518 and explicitly instantiating all the muxing in Calyx program before we even reach the backend.
Yeah, good point. I think it could be fun to think a bit about what the IR would look like in the "required" version, and what the relevant lowering/optimization passes would look like.
We need to say something stronger here:
If it is used as an input for another @control instance
In general, a lot of the computations are going to end up transitively related to assignments to a group's done signal. For example, here is an image of a part of the writes-to graph for lpm.futil:

The thing to notice is that most group[done] writes come from the done condition of a register (or something similar). Concretely, for assignment like this:
r.in = lt.out
r.write_en = 1'd1
group[done] = r.done;
We should still be able to mark lt.out as a @data cell because the r.done signal is going to contain a non-undef value regardless of what lt.out contains.
In contrast, for something like this:
r.in = lt.out
r.write_en = c.out ? 1'd1
group[done] = r.done;
We would need to mark c.out as a @control cell because whether or not r.done triggers depends upon c.out's value and because of this, it cannot be undef
Hmm; interesting example. You observed:
We would need to mark
c.outas a@controlcell because whether or notr.donetriggers depends uponc.out's value
Correct me if I'm wrong, but in the statically-checked version of this system, we would need to require anything used in a guard to be @control, right? That is, it would be a compile-time error for a guard to have a @data type annotation. (Is that right?)
Yup! Guards are not allowed to "glitch" and hold 'x values ever. Maybe there is some deeper principle at play here that puts guards in the same category as @control ports