circt
circt copied to clipboard
[Arc] Allow uniform operands to the VectorizeOp
The arc.vectorize
op currently has no way of capturing if some of the vector lanes all have the exact same input operand. This is important for vectorizing arc.state
ops which have a clock, enable, and reset that all must be identical in order to be able to vectorize the op.
For example,
arc.vectorize (%0, %1), (%2, %3), (%clk, %clk) {
^bb0(%arg0, %arg1, %arg2):
arc.state @Foo(%arg0, %arg1) clock %arg2
}
would vectorize into (psuedo-ops)
%v0 = vector_create %0, %1
%v1 = vector_create %2, %3
arc.state @FooVec(%v0, %v1) clock %clk
where the inputs to the arc @Foo
can be vectorized, because we can vectorize the entire arc definition as @FooVec
, but the clock %clk
has to remain a scalar. This is only possible if all vector lanes used the same clock %clk
.
It might be useful to capture the uniformity of %clk
as part of the vectorize op itself:
arc.vectorize (%0, %1), (%2, %3) uniform %clk {
^bb0(%arg0, %arg1, %arg2):
arc.state @Foo(%arg0, %arg1) clock %arg2
}
This makes lowering easier, since the vectorize op allows for its operands to be vectorized at a different point in time than its body. Without the uniform operands, the block arguments of the body would vectorize into something like %arg0: vector<2 x i42>, %arg2: vector<2 x i42>, %arg3: vector<2 x !seq.clock>
. However, the arc.state
op requires a scalar clock %arg2
, so the lowering would not be possible. However, with the uniform operands, it would be clear that %arg2
is a single uniform !seq.clock
, which would allow the vectorization of the body to occur.
To implement this, @maerhart had suggested that we could have an additional group of variadic operands that contains all the uniform operands. For example:
arc.vectorize (a, b), (c, d), (e, f) {
^bb0(%vAB, %vCD, %e, %f):
...
}
The last operand group (e, f)
is understood to be the uniform operands. It can be empty if there are no uniform operands. The body block would have one block argument for the first group (a, b)
, one block argument for the second group (c, d)
, and then block arguments for each operand in the final group (e, f)
.