loopy icon indicating copy to clipboard operation
loopy copied to clipboard

Allow vec tagging of odd sizes for local temporaries

Open isuruf opened this issue 2 years ago • 1 comments

For context, in a sumpy P2P kernel I have a temporary of size

local_isrc[5, 45]

which results in 5 memory loads/stores, but it could be split into

local_isrc_s0[2, 45]
local_isrc_s1[2, 45]
local_isrc_s2[1, 45]

which results in only 3 memory loads/stores.

One way that I can achieve this is to do

lp.split_array_axes(knl, "local_isrc", 0, 2)
lp.tag_array_axes(knl, "local_isrc", "C,vec,C")

however this results in 6*45 elements being allocated in shared memory. (Sometimes the compiler optimizes this into 5, 45, sometimes not).

I tried

lp.split_array_axes(knl, "local_isrc", 0, 2)
lp.tag_array_axes(knl, "local_isrc", "sep,vec,C")

which does not work.

isuruf avatar May 30 '23 00:05 isuruf

Sometimes the compiler optimizes this into 5, 45, sometimes not

Turns out, the compiler does optimize it predictably. Was looking at a wrong source code.

isuruf avatar May 30 '23 14:05 isuruf