loopy icon indicating copy to clipboard operation
loopy copied to clipboard

Loopy's codegenerator fails with non-deterministic error

Open kaushikcfd opened this issue 3 years ago • 3 comments

For the quite simple kernel

import loopy as lp
knl = lp.make_kernel(
    ["{ [i_outer, i_inner] : 4i_outer <= n - 5 and 0 <= i_inner < 4 and 0 <= i_inner + 4i_outer < n }",
     "{ [slab_i_outer] : n - 4 <= 4slab_i_outer < n }",
     "[slab_i_outer, n] -> { [slab_i_inner] : 0 <= slab_i_inner < 4 and 0 <= slab_i_inner + 4slab_i_outer <  n }"],
    """
    a[i_inner + 4*i_outer] = i_inner + 4*i_outer
    a[slab_i_inner + 4*slab_i_outer] = slab_i_inner + 4*slab_i_outer
    """, seq_dependencies=True)

knl = lp.tag_inames(knl, "i_outer:g.0, slab_i_outer:g.0")

lp.generate_code_v2(knl)

Depending on PYTHONHASHSEED the error cycles between:

  • In setup_hw_parallel_loops
NotImplementedError: only single-valued piecewise affine expressions are supported here--encountered multi-valued expression '[n] -> { [(-2 + 2n - 2*floor((3n)/4))] : n >= 5; [(n - floor((3n)/4))] : n <= 4 }'
  • In check_implemented_domains
loopy.diagnostic.LoopyError: sanity check failed--implemented and desired domain for instruction 'insn_0' do not match

kaushikcfd avatar Jun 15 '21 05:06 kaushikcfd

I just ran this through the code-generator in #372 and that treats it correctly. At least from my end, debugging this on the current trunk seems unnecessary.

kaushikcfd avatar Jun 15 '21 05:06 kaushikcfd

With bd1ea33 (#422) and lpbug.py the updated reproducer above:

PYTHONHASHSEED=22 pycl lpbug.py  

I get

-------------------------------------------------------------------------------
CODE:
-------------------------------------------------------------------------------
__kernel void __attribute__ ((reqd_work_group_size(1, 1, 1))) loopy_kernel(__global int *__restrict__ a, int const n)
{
  for (int i_inner = 0; i_inner <= 3; ++i_inner)
    a[4 * gid(0) + i_inner] = i_inner + 4 * gid(0);
  if (-1 + -4 * (gid(0) + -1 + n + -1 * loopy_floor_div_pos_b_int32(3 * n, 4)) + n >= 0)
    for (int slab_i_inner = 0; slab_i_inner <= -1 + n + -4 * (gid(0) + -1 + n + -1 * loopy_floor_div_pos_b_int32(3 * n, 4)); ++slab_i_inner)
      a[4 * (gid(0) + -1 + n + -1 * loopy_floor_div_pos_b_int32(3 * n, 4)) + slab_i_inner] = slab_i_inner + 4 * (gid(0) + -1 + n + -1 * loopy_floor_div_pos_b_int32(3 * n, 4));
}

-------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/andreas/tmp/lpbug.py", line 13, in <module>
    lp.generate_code_v2(knl)
  File "/home/andreas/src/loopy/loopy/codegen/__init__.py", line 777, in generate_code_v2
    cgr = generate_code_for_a_single_kernel(program[func_id],
  File "/home/andreas/src/loopy/loopy/codegen/__init__.py", line 557, in generate_code_for_a_single_kernel
    assert check_implemented_domains(kernel, codegen_result.implemented_domains,
  File "/home/andreas/src/loopy/loopy/check.py", line 1542, in check_implemented_domains
    raise LoopyError("sanity check failed--implemented and desired "
loopy.diagnostic.LoopyError: sanity check failed--implemented and desired domain for instruction 'insn_0' do not match

implemented: [n] -> { [slab_i_outer, slab_i_inner] : n >= 5 and 4slab_i_outer >= -4 + n and 0 <= slab_i_inner < n - 4slab_i_outer and 2*floor((-n)/4) <= -3 - slab_i_outer }

desired:[n] -> { [slab_i_outer, slab_i_inner] : 4slab_i_outer >= -4 + n and slab_i_inner >= 0 and -4slab_i_outer <= slab_i_inner < n - 4slab_i_outer }

sample point in desired but not implemented: slab_i_inner=0, slab_i_outer=0, n=4
gist of constraints in desired but not implemented: [n] -> { [slab_i_outer, slab_i_inner] : n >= 5 }

inducer avatar Jun 16 '21 22:06 inducer

I guess I agree with you. Given that this code is getting replaced, let's not worry about it.

inducer avatar Jun 16 '21 22:06 inducer