Halide
Halide copied to clipboard
loop-unroll failure in Autoscheduler
Using the simple Blur generator below, run it thru the single-shot autoscheduler with the default weights. Result will be failure with:
Can only unroll for loops over a constant extent.
Loop over repeat_edge.s0._0._0 has extent (((min((((select((0 < blurred_y.s0.x.x), (max(min((blurred_y.extent.0 - (blurred_y.s0.x.x*128)), 128), 0) + 31), 159)/32)*32) + (select((0 < blurred_y.s0.x.x), min((blurred_y.s0.x.x*128), blurred_y.extent.0), min((blurred_y.s0.x.x*128), (blurred_y.extent.0 + -128))) + blurred_y.min.0)), ((min((blurred_y.s0.x.x*128), (blurred_y.extent.0 + -128)) + blurred_y.min.0) + 128)) - (min(select((0 < blurred_y.s0.x.x), min((blurred_y.s0.x.x*128), blurred_y.extent.0), min((blurred_y.s0.x.x*128), (blurred_y.extent.0 + -128))), (min((blurred_y.s0.x.x*128), (blurred_y.extent.0 + -128)) + 96)) + blurred_y.min.0)) + 39)/32).
Source:
namespace {
using namespace Halide;
using namespace Halide::BoundaryConditions;
using namespace Halide::ConciseCasts;
constexpr int kWidth = 480;
constexpr int kHeight = 640;
constexpr float kGamma = 1.45f;
constexpr int kRadius = 4;
constexpr int kDiameter = kRadius * 2 + 1;
constexpr int kRangeBits = 8;
constexpr uint32_t kRange = static_cast<uint32_t>(1) << kRangeBits;
Var x("x"), y("y");
inline float GaussianCoefficient(int r) {
return std::exp(-0.5f * r * r / (kGamma * kGamma));
}
class Blur : public Halide::Generator<Blur> {
public:
Input<Buffer<int8_t>> input_{"input", 2};
Output<Buffer<int8_t>> output_{"output", 2};
void generate() {
Func input_bounded = repeat_edge(input_);
int16_t coefficients_data[kDiameter];
int16_t* coefficients = &coefficients_data[kRadius];
double sum = 0.0;
for (int rx = -kRadius; rx <= kRadius; rx++) {
sum += GaussianCoefficient(rx);
}
const double scale = kRange / sum;
for (int rx = -kRadius; rx <= kRadius; rx++) {
coefficients[rx] = GaussianCoefficient(rx) * scale;
}
Expr gx = i16(0);
for (int rx = -kRadius; rx <= kRadius; rx++) {
gx += i16(input_bounded(x + rx, y)) * Expr(coefficients[rx]);
}
Func blurred_x("blurred_x");
blurred_x(x, y) = i8(gx / (1 << kRangeBits));
Expr gy = i16(0);
for (int ry = -kRadius; ry <= kRadius; ry++) {
gy += i16(blurred_x(x, y + ry)) * Expr(coefficients[ry]);
}
Func blurred_y("blurred_y");
blurred_y(x, y) = i8(gy / (1 << kRangeBits));
output_ = blurred_y;
{
input_.set_estimates({{0, kWidth}, {0, kHeight}});
output_.set_estimates({{0, kWidth}, {0, kHeight}});
}
if (!auto_schedule) {
input_.dim(0).set_bounds(0, kWidth).set_stride(1);
input_.dim(1).set_bounds(0, kHeight).set_stride(kWidth);
blurred_x.compute_root();
output_.compute_root();
}
}
};
} // namespace
attn @kpassarella
This is an interaction with shiftinwards and sliding window making a loop extent hard to analyze. The thing to do might be to make the autoscheduler not assume the loops of anything that is slid are constant.
I've wondered whether it's a mistake to have sliding-window be implicitly scheduled. Was it ever considered as an explicit scheduling directive?
Hi, I am facing the same issue as well. Currently, I am setting HL_PERMIT_FAILED_UNROLL to 1 to allow my code to compile. I was wondering if there is another solution/workaround for this that avoids this error? Thank you.
Is this still open/active?
I believe I've run into this as well, using the resize example and Adams2019 autoscheduler.
Unhandled exception: Error: Can only unroll for loops over a constant extent.
Loop over kernel_y.s0.k has extent int32((float32)ceil_f32((1.000000f/(float32)scale_factor)*4.000000f)).
Unhandled exception: Error: Can only unroll for loops over a constant extent.
Loop over unnormalized_kernel_y.s0.k has extent int32((float32)ceil_f32((1.000000f/(float32)scale_factor)*4.000000f)).
Unhandled exception: Error: Can only unroll for loops over a constant extent.
Loop over unnormalized_kernel_y.s0.k has extent (max(kernel_y.s0.y.y.y/((output_buffer.extent.1 + 15)/16), 3) + 1).
I'm having the same problem when my code is autoscheduled too. Any workaround please?