tvm icon indicating copy to clipboard operation
tvm copied to clipboard

[TIR] Fix plan buffer allocation location for loop carried dependencies

Open wrongtest-intellif opened this issue 3 years ago • 0 comments

The pass PlanAndUpdateBufferAllocationLocation seems to have problem when the buffer accessed indices take a loop carried dependency. As an example,

@T.prim_func
def test(A: T.Buffer[(8, 8), "int32"], B: T.Buffer[(8, 8), "int32"]):
    C = T.alloc_buffer([8, 8], "int32")
    for i in range(8):
        for j in range(8):
            with T.block("b0"):
                vi = T.axis.spatial(8, i)
                vj = T.axis.spatial(8, j)
                C[vi, vj] = A[vi, vj] + vi
        for j in range(8):
            with T.block("b1"):
                vi = T.axis.opaque(8, i)
                vj = T.axis.spatial(8, j)
                B[vi, vj] = C[vi, vj] + T.if_then_else(vi > 0, C[vi - 1, vj], vi, dtype="int32")

The block b1's read access to intermediate buffer C on iteration i, depends b0 write of C on both i and i-1, thus we should not put allocation of C under loop i, which is the LCA position of current plan strategy.

To fix the issue we change the behavior of DetectBufferLCA to be aware of opaque block iters (loop carried dependency and other more complex behaviors are categorized as opaque in iter type annotation).

It enforce that every legal "ancestor" of buffer accesses should dominate all loops relates to accessed opaque block iters within buffer indices. Eg, since vi is opaque, buffer C indices use vi, the loop i must be under the planned allocation point of C.

As an interesting workload related to loop carried dependency, refer to https://discuss.tvm.apache.org/t/rfc-introducing-a-rolling-buffer-scheduling-primitive/9836, where the intermediate result of previous iteration is try best to get reused.

cc @Hzfengsy @junrushao1994

wrongtest-intellif avatar Sep 12 '22 17:09 wrongtest-intellif