tvm
tvm copied to clipboard
[Bug] Init block not discoverable after sch.blockize
When used on a block with a init statement, blockize creates a separate init block that is not discoverable by any means. This hinders further scheduling, like tensorizing the init block.
Expected behavior
When using blockize on a loop that contains an init statement, the init is moved to a new block <block>_init that should be discoverable with get_block or get_children_blocks on the newly created outer block.
Actual behavior
Init block exists in the TIR module but does not seem to be registered by the schedule. get_block("<block>_init>") fails with InternalError: Check failed: (it != self_->stmt2ref.end()) is false
Stacktrace
Traceback (most recent call last):
File "/home/dev/tvm_upstream/../tvm/playground/blockize_init_bug.py", line 31, in
Environment
Reproducible on main (d4056ca79571d4265a12beeedd1b1565953df936)
Steps to reproduce
import tvm
from tvm.script import ir as I
from tvm.script import tir as T
@I.ir_module
class Module:
@T.prim_func
def main():
# with T.block("root"):
A_sum = T.alloc_buffer((1,), "float32")
A = T.alloc_buffer((1, 16), "float32")
for nn, ff in T.grid(1, 16):
with T.block("A"):
v_nn, v_ff = T.axis.remap("SR", [nn, ff])
T.reads(A[v_nn, v_ff])
T.writes(A_sum[v_nn])
with T.init():
A_sum[v_nn] = T.float32(0)
A_sum[v_nn] = A_sum[v_nn] + A[v_nn, v_ff]
sch = tvm.tir.Schedule(Module)
a = sch.get_block("A")
loop_n, loop_f = sch.get_loops(a)
sch.blockize(loop_f)
print(sch.mod) # <-- A_init exists
a_init = sch.get_block("A_init") # <-- fails with InternalError: Check failed: (it != self_->stmt2ref.end()) is false
Triage
- tir:schedule
I think you are supposed to call decompose_reduction before blockize:
import tvm
from tvm.script import ir as I
from tvm.script import tir as T
@I.ir_module
class Module:
@T.prim_func
def main():
# with T.block("root"):
A_sum = T.alloc_buffer((1,), "float32")
A = T.alloc_buffer((1, 16), "float32")
for nn, ff in T.grid(1, 16):
with T.block("A"):
v_nn, v_ff = T.axis.remap("SR", [nn, ff])
T.reads(A[v_nn, v_ff])
T.writes(A_sum[v_nn])
with T.init():
A_sum[v_nn] = T.float32(0)
A_sum[v_nn] = A_sum[v_nn] + A[v_nn, v_ff]
sch = tvm.tir.Schedule(Module)
a = sch.get_block("A")
loop_n, loop_f = sch.get_loops(a)
sch.decompose_reduction("A", loop_n)
sch.blockize(loop_f)
init_block = sch.get_block("A_init")
print(sch.mod) # <-- A_init exists
For my usage scenario I need to keep the T.init() statement so decompose_reduction is not an option. Also the fact that the A_init block is present in the associated module but not discoverable through schedule accessors clearly indicates a bug IMO.