loopy
loopy copied to clipboard
Tighter write race checking has broken `test_argmax` in `test_scan`
This seems to happen repeatably on the Titan V.
Failure log: https://gitlab.tiker.net/inducer/loopy/-/jobs/400790
Reverting to 837e0d0002a84e2ef557e71e501e6f89076fb2cc (i.e. just before the merge of #224) clears the failure.
@kaushikcfd What should we do? Revert for now, or fix?
Found an MWE:
knl = lp.make_kernel(
["{[i]: 1<=i<17}",
"{[j]: 0<=j<16}"],
"""
a[i] = i
b[j] = 2*a[j]
""")
knl = lp.tag_inames(knl, {"i": "l.0", "j": "l.0"})
generates;
__kernel void __attribute__ ((reqd_work_group_size(16, 1, 1))) loopy_kernel(__global int *__restrict__ a, __global int *__restrict__ b)
{
a[lid(0) + 1] = lid(0) + 1;
b[lid(0)] = 2 * a[lid(0)];
}
Notice how an lbarrier wasn't emitted :astonished: even though there is a RAW race.
Loopy's code-generator skewing the access index for local hw-inames that a non-zero lbound was not accounted in the new write-race checker.
Nice catch! That makes sense. A good question is what we should do about that. We could apply that offset as a transformation before codegen, to preserve the literal iname <-> [gl].0
mapping, or hack the mapping into the write race checker.
The former comes at the cost of another traversal (at least when there are nonzero offsets, but that's rare enough I think), but is overall cleaner I think.
I still get an email (at least) once a week about how Loopy doesn't pass its GPU CI.
I'll unassign myself from this to avoid gatekeeping someone else from fixing this. I can fix this in late August. (need to cross off certain items from the graduation checklist)