loopy
loopy copied to clipboard
AssertionError when calling add_prefetch with SubArrayRefs
Attempting to add a prefetch to a kernel with a SubArrayRef leads to an assertion error. Some mapper is attempting to create a new SubArrayRef but passes None for both swept_inames and subscript.
Reproducer:
import loopy as lp
child_knl = lp.make_function(
"[N] -> {[i]: 0<=i<N-1}",
"""
g[i] = f[i] + f[i+1]
""", [...], name="func")
knl = lp.make_kernel(
"[N] -> {[i]: 0<=i<N-1}",
"""
[i]: g[i] = func([i]: f[i])
""",
[
lp.GlobalArg("f", shape=("N",)),
lp.GlobalArg("g", shape=("N",)),
...
],
options=lp.Options(write_cl=True),
)
knl = lp.merge([knl, child_knl])
knl = lp.split_iname(knl, "i", 32, outer_tag="g.0", inner_tag="l.0")
knl = lp.add_prefetch(knl, "f", "i_inner")
Same behavior if I first merge knl with one defining func.
Apologies if I'm misunderstanding/misusing things - flying a bit blind here. (If there happen to be examples for this type of usage anywhere, I'd be glad for them - couldn't find any in, e.g., test_callables.py.)
Hello @zachjweiner! There are a couple of issues here:
- The issue you point out regarding prefetch over sub-array-refs is a bug.
- There isn't a good shape-inference support at a call-site yet. So, solving (1) might not be fruitful yet, as I would expect the value of "N" in the callee (
func) would also have to be updated as we update the sub-array region passed in.
Thanks, @kaushikcfd! I'm still wrapping my head around the callables "model" at the moment.
In practice, all I'm actually looking to do (for now) is prefetch over inames that the callee isn't aware of - something like
child_knl = lp.make_function(
"{:}",
"""
g = f[0] + f[1]
""", name="func")
knl = lp.make_kernel(
"{[i] : 0 <= i < 32}",
"""
g[i] = func(f[:, i])
""",
[
lp.GlobalArg("f", shape=(2, 32)),
lp.GlobalArg("g", shape=(32,))
],
options=lp.Options(write_cl=True),
)
knl = lp.split_iname(knl, "i", 4, outer_tag="g.0", inner_tag="l.0")
knl = lp.add_prefetch(knl, "f", "i_inner")
knl = lp.merge([knl, child_knl])
If the bug (1) were fixed, would this use case be able to skirt the shape inference issues? I imagine yes if one inlined all called kernels (or redirected inputs via temporaries)?