loopy icon indicating copy to clipboard operation
loopy copied to clipboard

Tighter write race checking has broken `test_argmax` in `test_scan`

Open inducer opened this issue 2 years ago • 4 comments

This seems to happen repeatably on the Titan V.

Failure log: https://gitlab.tiker.net/inducer/loopy/-/jobs/400790

Reverting to 837e0d0002a84e2ef557e71e501e6f89076fb2cc (i.e. just before the merge of #224) clears the failure.

@kaushikcfd What should we do? Revert for now, or fix?

inducer avatar Apr 19 '22 21:04 inducer

Found an MWE:

knl = lp.make_kernel(
    ["{[i]: 1<=i<17}",
     "{[j]: 0<=j<16}"],
    """
    a[i] = i
    b[j] = 2*a[j]
    """)

knl = lp.tag_inames(knl, {"i": "l.0", "j": "l.0"})

generates;

__kernel void __attribute__ ((reqd_work_group_size(16, 1, 1))) loopy_kernel(__global int *__restrict__ a, __global int *__restrict__ b)
{
  a[lid(0) + 1] = lid(0) + 1;
  b[lid(0)] = 2 * a[lid(0)];
}

Notice how an lbarrier wasn't emitted :astonished: even though there is a RAW race.

Loopy's code-generator skewing the access index for local hw-inames that a non-zero lbound was not accounted in the new write-race checker.

kaushikcfd avatar Apr 20 '22 15:04 kaushikcfd

Nice catch! That makes sense. A good question is what we should do about that. We could apply that offset as a transformation before codegen, to preserve the literal iname <-> [gl].0 mapping, or hack the mapping into the write race checker.

The former comes at the cost of another traversal (at least when there are nonzero offsets, but that's rare enough I think), but is overall cleaner I think.

inducer avatar Apr 20 '22 15:04 inducer

I still get an email (at least) once a week about how Loopy doesn't pass its GPU CI.

inducer avatar Jun 06 '22 01:06 inducer

I'll unassign myself from this to avoid gatekeeping someone else from fixing this. I can fix this in late August. (need to cross off certain items from the graduation checklist)

kaushikcfd avatar Jul 06 '22 01:07 kaushikcfd