LLVM ERROR from Halide 18.0.0
I got the error below when using Halide-18.0.0-x86-64-windows-41bc134ae9a8fa32d968867ac1aeeac6f63a142e, which I downloaded from https://buildbot.halide-lang.org/:
LLVM ERROR: Cannot select: t37: ch = masked_store<(store unknown-size into %ir.sum15, align 64, !tbaa !45)> t0, t28, FrameIndex:i64<0>, undef:i64, t35 t28: v4f64 = BUILD_VECTOR ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00> t13: f64 = ConstantFP<0.000000e+00> t13: f64 = ConstantFP<0.000000e+00> t13: f64 = ConstantFP<0.000000e+00> t13: f64 = ConstantFP<0.000000e+00> t12: i64 = FrameIndex<0> t15: i64 = undef t35: v4i1 = setcc t30, t33, setle:ch t30: v4i32 = extract_subvector t2, Constant:i64<0> t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %23 t1: v8i32 = Register %23 t29: i64 = Constant<0> t33: v4i32 = extract_subvector t4, Constant:i64<0> t4: v8i32,ch = CopyFromReg t0, Register:v8i32 %24 t3: v8i32 = Register %24 t29: i64 = Constant<0> In function: Convolve
My Halide Generator class is attached: myHalideGenerator.txt
My command to run my Halide Generator class is: myHalideGenerator.exe -g Convolve -f Convolve input.type=float64 kernel.type=float64 output.type=float64 target=x86-64-windows-large_buffers-enable_llvm_loop_opt-avx512-avx2-avx-sse41-no_runtime-no_asserts -o ./
I found this error also happens with x86-64-osx package.
Can confirm this is due to the feature flag avx512.
❯ DYLD_LIBRARY_PATH=../../../distrib/lib ./Convolve -g Convolve -f Convolve input.type=float64 kernel.type=float64 output.type=float64 target=host-avx512-no_runtime-no_bounds_query -o ./
LLVM ERROR: Cannot select: 0x7fd8ca04c2d0: ch = masked_store<(store unknown-size into %ir.lsr.iv21, align 8, !tbaa !51)> 0x7fd8ca041890, 0x7fd8ca046910, 0x7fd8ca045a60, undef:i64, 0x7fd8ca04a810
0x7fd8ca046910: v4f64,ch = load<(dereferenceable load (s256) from %ir.sum38, align 64, !tbaa !35)> 0x7fd8ca046400, FrameIndex:i64<0>, undef:i64
0x7fd8ca046240: i64 = FrameIndex<0>
0x7fd8ca04c8f0: i64 = undef
0x7fd8ca045a60: i64,ch = CopyFromReg 0x7fd8c9909f60, Register:i64 %89
0x7fd8ca0461d0: i64 = Register %89
0x7fd8ca04c8f0: i64 = undef
0x7fd8ca04a810: v4i1 = setcc 0x7fd8ca04a180, 0x7fd8ca04bfc0, setle:ch
0x7fd8ca04a180: v4i32 = extract_subvector 0x7fd8ca0a5b70, Constant:i64<0>
0x7fd8ca0a5b70: v8i32,ch = CopyFromReg 0x7fd8c9909f60, Register:v8i32 %44
0x7fd8ca045b40: v8i32 = Register %44
0x7fd8ca04be70: i64 = Constant<0>
0x7fd8ca04bfc0: v4i32 = extract_subvector 0x7fd8ca046f30, Constant:i64<0>
0x7fd8ca046f30: v8i32,ch = CopyFromReg 0x7fd8c9909f60, Register:v8i32 %45
0x7fd8ca0ac720: v8i32 = Register %45
0x7fd8ca04be70: i64 = Constant<0>
In function: Convolve
Pipeline compiles fine without avx512. @jxl1080 I updated your generator to this:
class Convolve : public Halide::Generator<Convolve> {
public:
// We declare the Inputs to the Halide pipeline as public
// member variables. They'll appear in the signature of our generated
// function in the same order as we declare them.
Input<Buffer<>> input{"input", 2};
Input<Buffer<>> kernel{ "kernel", 1 };
Input<uint32_t> outputDim{"inputLen"};
Output<Buffer<>> output{ "output", 2 };
private:
Var x{"x"},c{"c"};
Expr filterLen;
public:
// We then define a method that constructs and return the Halide
// algorithm pipeline:
void generate() {
filterLen = kernel.dim(0).extent();
Halide::RDom rk(0, filterLen);
output(x,c) = Halide::sum(kernel(rk.x) * input(x + rk.x,c));
}
// scheduling pipeline:
void schedule() {
Expr vectorSize = natural_vector_size(output.type());
output.vectorize(x, vectorSize, TailStrategy::GuardWithIf);
}
};
HALIDE_REGISTER_GENERATOR(Convolve, Convolve)
I tried mcourteaux's modified generator class, it still failed with avx512. Thus a fix for this bug is still needed.
LLVM ERROR: Cannot select: t37: ch = masked_store<(store unknown-size into %ir.sum15, align 64, !tbaa !45)> t0, t28, FrameIndex:i64<0>, undef:i64, t35
This may well be a bug in LLVM 18 (rather than Halide itself). Can you try with top-of-tree LLVM + top-of-tree Halide and see if it still repros?
I tried mcourteaux's modified generator class, it still failed with avx512. Thus a fix for this bug is still needed.
I was just trying to give some feedback. Was by no means meant as a fix. Was showing you that you can access buffer extents: you don't have to explicitly pass them as extra arguments.
I am also encountering the same error when compiling with the avx512 flag in Halide v19. However, the compilation works fine when targeting other avx512 variants, such as avx512_cannonlake, avx512_skylake, avx512_zen4, etc.
Additionally, I do not face this issue when using the avx512 flag with Halide v16.
Any advice on how to resolve this?
@abadams could you take a look at this? I often see you working with the avx512 codegen. Please see my comment above, that might save you some time.
That's an LLVM bug, so there's possibly not much we can do about it. I wouldn't ever use the avx512 flag by itself. It asks for the lowest-common-denominator avx512, which is the intersection of the instructions supported by both avx512 CPUs, and those xeon phi accelerators from a few years ago. This amounts to the AVX512 F and CD extensions. I'd use at least avx512_skylake, which is the F, CD, BW, VL, and DQ extensions. I'm trying to figure out if there are any appreciable number of cpus out there than have avx512f but not avx512bw. Wikipedia claims some of the early Xeon skylake-sp processors didn't have it, but any specific processor in that category that I check on wikichip claims to have it, so I'm not sure who's wrong here.
Given all the issues, we should just deprecate/remove that flag, since it's both buggy and apparently-not-useful for real world hardware.