loopy
loopy copied to clipboard
`lp.privatize_temporaries_with_inames` fails for multiple inames sweeping the same temporary
For the kernel:
knl = lp.make_kernel(
"{[i,j]: 0<=i<10 and 10<=j<14}",
"""
for i
<> tmp = i
out[i] = tmp
end
for j
tmp = j
out[j] = tmp
end
""",
name="arange_14",
seq_dependencies=True)
lp.privatize_temporaries_with_inames(knl, {"i", "j"})
we get an error:
loopy.diagnostic.LoopyError: instruction 'insn' requires adding indices for privatizing var 'tmp' on iname(s) 'i', but previous instructions required inames 'j'
I kind agree with the error message here: after privatization, out will have shed its subscript, and so there's no possibility of retrieving it within a different loop. What would you like to happen in this case? Maybe call the transformation twice, with different values of within?
Privatize currently does not support a within. Besides that, supporting this via a within is sketchy since between the call to privatize_temporary_with_iname(..., within="iname:i") and privatize_temporary_with_iname(..., within="iname:j") the shape of tmp would be undefined. (?)
I think what should have happened here is that loopy should have realized that {[i0]: 0<=i0<10} would be the access map for tmp and accordingly expand the array.
I think what should have happened here is that loopy should have realized that
{[i0]: 0<=i0<10}would be the access map fortmpand accordingly expand the array.
But... the point of privatization is to get rid of array indices, right? And after the end of each iteration of the i loop, there's no possibility to recover the value of out[i]. And if the two uses of out were semantically sufficiently different that changes to one shouldn't be visible to the other, why wouldn't they be separate variables?
But... the point of privatization is to get rid of array indices, right?
Wouldn't privatization would add array axes instead of getting rid of them. I always thought it like array expansion.
In the ideal case, the transformed version of the above kernel would be:
knl = lp.make_kernel(
"{[i,j]: 0<=i<10 and 10<=j<14}",
"""
for i
<> tmp[i] = i
out[i] = tmp[i]
end
for j
tmp[j-10] = j
out[j] = tmp[j-10]
end
""")
Ah, nvm. I had that backwards. You're right. But I'm still not sure privatization applies in this case: The iname already indexes the variable in question. Privatization would try to add the same index, no?
The iname already indexes the variable in question
lp.privatize_temporaries_with_inames privatizes the temporaries. So out would remain untouched but tmp would be indexed, which in the untransformed kernel has the shape ().
:facepalm: Somehow I was staring at out the whole time. Sorry.
Would it be ok if the privatization of tmp within i and j went into separate variables?
Would it be ok if the privatization of tmp within i and j went into separate variables?
For my application, this kernel was obtained during pre-processing after i and j were tagged with ILP tags, so I don't have much control over the kernel that's fed to the privatization transformation. But.. I think this way of duplicating temporary sounds easier to implement than the access range checking.