Halide
Halide copied to clipboard
RuntimeError: Internal Error at (...)/PartitionLoops.cpp:946 triggered by user code at : Unexpected construct inside if statement
I have a Pipeline with several outputs, and I want to run them in the same GPU kernel (and not several GPU kernels). I am trying to use compute_with in order to achieve this. The print_loop_nest looks as I want, but unfortunately I hit an internal error when trying to compile_to_static_library, and I am not sure what is the cause of this.
The full error output is here: https://gist.githubusercontent.com/apartridge/7bc93472d21d8585cfcbf3aff343944d/raw/9c54a2a0343bb56f26a54f83999e89845ced9765/gistfile1.txt
We are using Halide version 13.0.4 with some patches/bug fixes on top (which hopefully not affects this).
Reproducer:
from halide import Func, LoopLevel, Pipeline, ImageParam, Float, UInt, Var, Target, get_host_target, TargetFeature
def find_gpu_target() -> Target:
host_target = get_host_target()
operating_system, architecture, bitness = host_target.os, host_target.arch, host_target.bits
target_extensions = [
TargetFeature.OpenCL,
TargetFeature.NoRuntime,
]
return Target(operating_system, architecture, bitness, target_extensions)
def create_kernel(input_a: Func, input_b: Func) -> Pipeline:
col = Var("col")
row = Var("row")
img = Var("img")
col_outer = Var("col_outer")
row_outer = Var("row_outer")
col_inner = Var("col_inner")
row_inner = Var("row_inner")
def gpu_tile_in_place(func):
if func.dimensions() == 3:
func.reorder(img, col, row)
func.compute_root()
func.gpu_tile(col, row, col_outer, row_outer, col_inner, row_inner, 16, 16)
out_a = Func("out_a")
out_a[col, row, img] = input_a[col, row, img]
out_b = Func("out_b")
out_b[col, row, img] = input_b[col, row, img]
gpu_tile_in_place(out_a)
gpu_tile_in_place(out_b)
# Disable this line and this program runs fine
out_b.compute_with(LoopLevel(out_a, img, stage_index=0))
output = Pipeline([out_a, out_b])
output.print_loop_nest()
return output
input_a = ImageParam(
type=Float(32),
dimensions=3,
name="input_a",
)
input_b = ImageParam(
type=UInt(8),
dimensions=3,
name="input_b",
)
kernel = create_kernel(
input_a=input_a,
input_b=input_b
)
params = [input_a, input_b]
kernel.compile_to_static_library(
f"/tmp/tmp.a",
params,
"testProgram",
find_gpu_target(),
)