calyx Annotations for Data Path Components

Another shot to answer the evergreen undefinedness of undriven signals in Calyx (https://github.com/cucapra/calyx/discussions/922). In its most reduced operational form, it asks when is it safe to provide 'x as the default value for a port instead of '0.

The core of this proposal are two new attributes: @data and @control. The @data attribute marks a cell as a data path component, and the @control attribute marks a cell as a control path component. These attributes are mutually exclusive, and a cell can only have one of them.

comb primitive std_add[WIDTH](@data a: WIDTH, @data b: WIDTH) -> (out: WIDTH);
primitive std_mult_pipe[WIDTH](@control go: 1, @data a: WIDTH, @data b: WIDTH) -> (out: WIDTH, @control done: 1);
cells {
  @data add0 = std_add(32);
  @control incr = std_add(32);
  @data mult = std_mult_pipe(32); // mult.go must always be defined
}

An instance marked with @control requires all its ports to have '0 as the default value. An instance marked with @data additionally needs to mark some of its ports with the @data attribute to state that these can accept 'x as the default value. This is because even for data path components, certain ports like the go port might always need a non-'x value.

This proposal follows a safe-by-default approach to the problem: it is always safe to ignore the @data attributes and provide all ports with a default value of '0. The @control attribute is just a safety mechanism that can be used to catch errors early.

Conditions for `@control` Instances

Following the things enumerated in https://github.com/cucapra/calyx/discussions/922, here is a list of some constraints the forces an instance to be a control instance:

If it is used in the guard of an assignment
If it is used as the done condition of a group
If it is used as the conditional port for if or while
If it is used as the input to a @go port such as write_en
If it is used as an input for another @control instance

Note that if any port of an instance is used in these positions, the instance must be marked as a @control instance. Number (5) is the challenging one here since it's a recursive condition. However, I don't see any other way to address the problem of propagation of undefinedness from data path to the control path. The interesting question is whether this approach will leave any instances as @data instances.

We can design a new pass that can compute which cells are @control using the criteria above. If these conditions are necessary and sufficient, all the cells not marked with @control are @data cells.

Semantics

Everything up till now gives a very operational view of how to address the undefinedness problem. However, it doesn't give us a clear view on what the exact semantic properties are. As far as I can tell, the semantics needs to be defined in terms of these attributes. There is no a priori way to define which signals are data path and which are control path; we can only call programs undefined given an assignment of attributes to instances.

Sep 08 '22 14:09 rachitnigam

The Filament issue above points out an optimization that this can enable in the backend:

It occurs to me that this optimization should be done by Calyx and not Filament itself because this reasoning applies to all data ports. Specifically, rewriting:

in = guard ? data : 'x

into

in = data

Should always be safe

Sep 12 '22 14:09 rachitnigam

Finally catching up on this very nice idea, @rachitnigam, after your helpful explanation today!

I think the note about allowed optimizations above is especially important. It's a very cool upside that we can show that certain optimizations are allowed on always-never-undefined values that are not possible on maybe-undefined values!!

Just to jot down some high-level thoughts that came out of today's brief discussion:

One way to "phase" this work would be to hammer things out in a fully dynamic context first, then work out the static checking. That would look like this:
1. In the first phase, define the dynamic semantics of the undefined value (here denoted undef) and the dynamic events that should be errors. For example, for error conditions: a guard expression evaluating to undef is not allowed, and it is also an error for a @go port ever to be assigned to undef. For dynamic semantics, undef aggressively taints guard expressions (e.g., (undef < 42) == undef), and it propagates aggressively through parameters like std_add. Hopefully, we extend the interpreter to implement these semantics precisely.
2. In the second phase, add @data/@control annotations as a way of soundly avoiding the dynamic error conditions. The theorem is basically that a well-typed (with respect to these annotations) Calyx program cannot ever encounter an undef-related error at run time.
For the static checker, I think it would be nice to view this as an information flow type system—i.e., all types are qualified (so, like, @data 32 or @control 1). The design and goal guarantees would be pretty similar to nullability annotations, as in Java's nullability checker.
As a corollary, a fun way to approach the issue above with annotated cells would be to make cells qualifier-polymorphic. The problem here is that we want to support cells that are sometimes used as data and sometimes used as control. For example, you might want the out port of std_add(32) to sometimes be @data 32 and sometimes be @control 32, depending on the adder. Qualifier polymorphism would mean that you would pass in a qualifier parameter, in exactly the same way that you pass in a width. So you could declare the primitive to have all its ports have type @Q WIDTH, and Q would be a type-level variable. Then if you instantiate std_add(32, @data) or std_add(32, @control) to fill in that Q.
A big question in my mind is whether we want non-undefness to be required or enforced by the compiler.
- Required: The program itself needs to guarantee that the value is never undef. The compiler doesn't need to insert any zero fallbacks. This has somewhat more predictable behavior, I'd say, since there aren't any mysterious zeroes that appear implicitly; all values that happen at run time come from the program text.
- Enforced: The compiler inserts zero-fallback assignments for all @control ports. This is the current behavior (but we currently do this to all ports). It would be lower effort, especially on the frontends' parts, because this is pretty much what they're expecting anyway. But it's admittedly funkier in terms of semantics, because you don't get direct control over where the zeroes get injected.

Sep 13 '22 20:09 sampsyo

A big question in my mind is whether we want non-undefness to be required or enforced by the compiler.

One important consequence of answering this question is whether the optimization can be represented in Calyx or not. If the compiler enforces this property then the optimization occurs in backend while the Calyx code is being compiled in Verilog. On the other hand, if programs are required to maintain it, then we can represent the optimization within Calyx.

A middle ground solution is following the philosophy outlined in #518 and explicitly instantiating all the muxing in Calyx program before we even reach the backend.

Sep 14 '22 21:09 rachitnigam

Yeah, good point. I think it could be fun to think a bit about what the IR would look like in the "required" version, and what the relevant lowering/optimization passes would look like.

Sep 14 '22 23:09 sampsyo

We need to say something stronger here:

If it is used as an input for another @control instance

In general, a lot of the computations are going to end up transitively related to assignments to a group's done signal. For example, here is an image of a part of the writes-to graph for lpm.futil: Screenshot 2023-03-08 at 10 23 12 PM

The thing to notice is that most group[done] writes come from the done condition of a register (or something similar). Concretely, for assignment like this:

r.in = lt.out
r.write_en = 1'd1
group[done] = r.done;

We should still be able to mark lt.out as a @data cell because the r.done signal is going to contain a non-undef value regardless of what lt.out contains.

In contrast, for something like this:

r.in = lt.out
r.write_en = c.out ? 1'd1
group[done] = r.done;

We would need to mark c.out as a @control cell because whether or not r.done triggers depends upon c.out's value and because of this, it cannot be undef

Mar 09 '23 06:03 rachitnigam

Hmm; interesting example. You observed:

We would need to mark c.out as a @control cell because whether or not r.done triggers depends upon c.out's value

Correct me if I'm wrong, but in the statically-checked version of this system, we would need to require anything used in a guard to be @control, right? That is, it would be a compile-time error for a guard to have a @data type annotation. (Is that right?)

Mar 10 '23 17:03 sampsyo

Yup! Guards are not allowed to "glitch" and hold 'x values ever. Maybe there is some deeper principle at play here that puts guards in the same category as @control ports

Mar 13 '23 19:03 rachitnigam

calyx calyx copied to clipboard

Annotations for Data Path Components

Conditions for @control Instances

Semantics

calyx
calyx copied to clipboard

Conditions for `@control` Instances