Improve `schedule/as_matmul` matching on hybrid loop-libop code
Current schedule/as_matmul can't match some basic cases when there are both loops and libop calls. Example:
for i in range(500):
c[i] = a[i] @ ft.transpose(b, (1, 0))
This code cannot be mapped to a Matmul node because libop introduces intermediate local variables. The code is actually like this:
for i in range(500):
t = ft.transpose(b, (1, 0))
u = a[i] @ t
c[i] = u
In order to deal with this case, we need two following changes:
- Call
inlineon every variables inside the matching sub-tree. There will be no side-effect because the matching will fail otherwise. This will solve the problem oftin this example. - Implement a new schedule, maybe named
anti_inline, that removes an intermediate variable an redirect allStores to it to the final destination. We can implementanti_inlinefor only variables that are copied to another variable with modifications (likec[i] = u). These will also be no side-effect because the matching will fail otherwise. This will solve the problem ofu.
After implementing the changes above, schedule/as_matmul is expected to match this code to a Matmul, but still can't deal with its derivatives. In order to accept such a code in AD, will need one more change:
- Simultaneously match multiple
Matmuls in oneschedule/as_matmulcall, instead of relying on#! prefer_libsto fission the loops.
Another approach to solve the problem (without considering AD) is to add a new schedule uncache and run it automatically inside as_matmul. As the name suggests, uncache undoes the cache schedule. uncache(v) detects whether v maps to a parent VarDef u, and replace v by u with corresponding indices.