loopy
loopy copied to clipboard
lp.precompute does not emit the correct `precompute_inames`
MWE:
import loopy as lp
knl = lp.make_kernel(
[
"{[iel, iface, idof, ifacedof]:"
" 0<=iface<4 and 0 <=iel<N_e and 0<=idof<20 and 0<=ifacedof<10}"],
"""
subst(f, e, j) := J[f, e] * u[e, j]
out[iel, idof] = sum([iface, ifacedof],\
subst(iface, iel, ifacedof) * R[iface, idof, ifacedof])
""")
knl = lp.split_iname(knl, "iel", 8, inner_tag="g.0", outer_tag="l.1")
knl = lp.precompute(knl, "subst",
sweep_inames=["ifacedof", "iel_inner"],
precompute_inames=["i0", "i1"],
precompute_outer_inames=frozenset(["iel_outer", "iface"]),
default_tag=None)
print(knl)
generates the following kernel:
---------------------------------------------------------------------------
INSTRUCTIONS:
for iel_outer, iface, i1, j
↱ subst_0[i1, j] = J[iface, i1 + 8*iel_outer]*u[i1 + 8*iel_outer, j] {id=subst}
│ end iface, i1, j
│ for idof, iel_inner
└ out[iel_inner + iel_outer*8, idof] = reduce(sum, [iface, ifacedof], subst_0[iel_inner, ifacedof]*R[iface, idof, ifacedof]) {id=insn}
end iel_outer, idof, iel_inner
---------------------------------------------------------------------------
Notice how it is subst_0[i1, j] and not subst_0[i0, i1].
I was able to do
knl = lp.precompute(knl, "subst",
sweep_inames=["ifacedof", "iel_inner"],
storage_axes=["e", "j"], # Had to add this argument
precompute_inames=["i0", "i1"],
precompute_outer_inames=frozenset(["iel_outer", "iface"]),
default_tag=None)
and get the desired precompute_inames. I think this behavior is unclear (and undocumented) i.e. should the precompute_inames be always accompanies with storage_axes, if so my earlier approach should have seen an exception/warning.