loopy icon indicating copy to clipboard operation
loopy copied to clipboard

Implements CVectorExtensionsTarget

Open kaushikcfd opened this issue 2 years ago • 7 comments

/cc @sv2518

Adds support for GNU vector extensions.

TODO:

  • [x] Implement OMPSimdInameTag.
  • [x] In loopy.codegen.expression infer the fallback mechanisms from the target.
  • [x] Pass CI.
  • [ ] Includes commits from https://github.com/inducer/loopy/pull/650

kaushikcfd avatar Mar 02 '22 15:03 kaushikcfd

This is ready for a look, for a better reviewing experience please see the patch on a commit-by-commit basis.

kaushikcfd avatar Mar 03 '22 20:03 kaushikcfd

There are two things which we definitely still need before we are able to fully automate this into the Firedrake/PyOP2 code.

  • [x] The first one is that conditionals cannot be vectorised yet. There is an error in Firedrake with ``pymbolic.mapper.UnsupportedExpressionError: <class 'loopy.expression.VectorizabilityChecker'> cannot handle expressions of type <class 'pymbolic.primitives.If'> An example of a test where we run into that is this one: tests/extrusion/test_mixed_periodic.py::test_mixed_periodic[interval]

--> Fix in PR

  • [x] That math functions are not vectorised is also still missing. We run into an error:
passing '__attribute__((__vector_size__(4 * sizeof(double)))) double' (vector of 4 'double' values) to parameter of incompatible type 'double'
   t0[expr_p0] = t0[expr_p0] + 6.283185307179586 * cos(expr_t1); 

An example of a test where we run into that is tests/slate/test_slate_infrastructure.py::test_arguments[dg1-mesh0]

--> Fix in PR

  • [x] Two other I believe faster to fix issues are one that is related to complex types on AVX512. We have an error that looks like the following so I think there is some support missing for complex128. Maybe we should not vectorised for complex in Firedrake?
File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/loopy/target/c_vector_extensions.py", line 107, in vector_dtype 
 vec.types[base.numpy_dtype, count], 
KeyError: (dtype('complex128'), 8)

--> This was a bug on our side. Fixed in PyOP2

  • [x] The other one I don’t quite understand but it is a gcc compiler error
error: use of undeclared identifier 'iel_batch'
   t0[0] = t0[0] + dat0[4 * iel_outer + iel_batch + start] * dat1[4 * iel_outer + iel_batch + start]; I had a look at the C code and I think an iname has been dropped
/* bulk slab for 'iel_outer' */
 for (int32_t iel_outer = 1; iel_outer <= -2 + -1 * start + (3 + end + 3 * start) / 4; ++iel_outer)
 {
  {
   int32_t const i4 = 0;

   #pragma omp simd
   for (int32_t iel_batch = 0; iel_batch <= 3; ++iel_batch)
    (t0[0])[iel_batch] = 0.0;
  }
  /* no-op (insn=inne__start) */
  {
   int32_t const inne_i = 0;

   t0[0] = t0[0] + dat0[4 * iel_outer + iel_batch + start] * dat1[4 * iel_outer + iel_batch + start];
  }
...

It looks like the iname for iel_batch was dropped

sv2518 avatar Mar 08 '22 13:03 sv2518

So I had a look at the code. I think I can fix the first issue by adding a map_if to VectorizabilityChecker that throws a UnvectorizableError(). Does that sound about right? For the second issue I think we need to check in map_variable if the variable is one of these

func_names = set(["abs_*", "fabs_*", "cos_*", "sin_*", "exp_*", "pow_*",
                       "sqrt_*", "fmax_*", "fmin_*", "atan2_*", "log_*",
                       "tanh_*"])

(and potentially other supported math functions) and throw a UnvectorizableError() if it is. If you agree with that I can write the code to address both.

sv2518 avatar Mar 09 '22 10:03 sv2518

It looks like the iname for iel_batch was dropped

This looks like a bug in loopy's vectorization implementation, with reductions in them.

(and potentially other supported math functions) and throw a UnvectorizableError() if it is.

I think falling back to omp-simd might make more sense. Looks like that's already being done here: https://github.com/inducer/loopy/blob/dd1ad1856467e9370b76854fbfbddc5230c29f34/loopy/expression.py#L91-L97

kaushikcfd avatar Mar 11 '22 16:03 kaushikcfd

From the Firedrake side this looks like its good to go. Thanks for all your work on it Kaushik! We will update our fork as soon as this is merged on your main and then merge the corresponding PyOP2 PR.

sv2518 avatar May 23 '22 13:05 sv2518

Some tests in the actions for the automatic vectorisation of Firedrake are currently failing due to the issue I reported in https://github.com/inducer/loopy/issues/648

sv2518 avatar Jul 13 '22 09:07 sv2518

Hi! I have tested the updated branch together with the updated version of Firedrake and PyOP2 and both CIs are passing. The PRs in both components were already approved and I am not around for long anymore. It would be awesome if you could merge this into Loo.py, so that we can merge it on our side and make vectorisation available to all of our users.

sv2518 avatar Oct 25 '22 11:10 sv2518