loopy `lp.privatize_temporaries_with_inames` fails for multiple inames sweeping the same temporary

For the kernel:

knl = lp.make_kernel(
    "{[i,j]: 0<=i<10 and 10<=j<14}",
    """
    for i
        <> tmp = i
        out[i] = tmp
    end
    for j
        tmp = j
        out[j] = tmp
    end
    """,
    name="arange_14",
    seq_dependencies=True)

lp.privatize_temporaries_with_inames(knl, {"i", "j"})

we get an error:

loopy.diagnostic.LoopyError: instruction 'insn' requires adding indices for privatizing var 'tmp' on iname(s) 'i', but previous instructions required inames 'j'

Jun 14 '21 17:06 kaushikcfd

I kind agree with the error message here: after privatization, out will have shed its subscript, and so there's no possibility of retrieving it within a different loop. What would you like to happen in this case? Maybe call the transformation twice, with different values of within?

Jun 16 '21 21:06 inducer

Privatize currently does not support a within. Besides that, supporting this via a within is sketchy since between the call to privatize_temporary_with_iname(..., within="iname:i") and privatize_temporary_with_iname(..., within="iname:j") the shape of tmp would be undefined. (?)

I think what should have happened here is that loopy should have realized that {[i0]: 0<=i0<10} would be the access map for tmp and accordingly expand the array.

Jun 16 '21 22:06 kaushikcfd

I think what should have happened here is that loopy should have realized that {[i0]: 0<=i0<10} would be the access map for tmp and accordingly expand the array.

But... the point of privatization is to get rid of array indices, right? And after the end of each iteration of the i loop, there's no possibility to recover the value of out[i]. And if the two uses of out were semantically sufficiently different that changes to one shouldn't be visible to the other, why wouldn't they be separate variables?

Jun 16 '21 22:06 inducer

But... the point of privatization is to get rid of array indices, right?

Wouldn't privatization would add array axes instead of getting rid of them. I always thought it like array expansion.

In the ideal case, the transformed version of the above kernel would be:

knl = lp.make_kernel(
    "{[i,j]: 0<=i<10 and 10<=j<14}",
    """
    for i
        <> tmp[i] = i
        out[i] = tmp[i]
    end
    for j
        tmp[j-10] = j
        out[j] = tmp[j-10]
    end
    """)

Jun 16 '21 23:06 kaushikcfd

Ah, nvm. I had that backwards. You're right. But I'm still not sure privatization applies in this case: The iname already indexes the variable in question. Privatization would try to add the same index, no?

Jun 17 '21 04:06 inducer

The iname already indexes the variable in question

lp.privatize_temporaries_with_inames privatizes the temporaries. So out would remain untouched but tmp would be indexed, which in the untransformed kernel has the shape ().

Jun 17 '21 04:06 kaushikcfd

:facepalm: Somehow I was staring at out the whole time. Sorry.

Would it be ok if the privatization of tmp within i and j went into separate variables?

Jun 17 '21 05:06 inducer

Would it be ok if the privatization of tmp within i and j went into separate variables?

For my application, this kernel was obtained during pre-processing after i and j were tagged with ILP tags, so I don't have much control over the kernel that's fed to the privatization transformation. But.. I think this way of duplicating temporary sounds easier to implement than the access range checking.

Jun 17 '21 06:06 kaushikcfd

loopy loopy copied to clipboard

`lp.privatize_temporaries_with_inames` fails for multiple inames sweeping the same temporary

loopy
loopy copied to clipboard