RAJA icon indicating copy to clipboard operation
RAJA copied to clipboard

Two issues with OpenMP target back end reduce class

Open mcordery opened this issue 4 years ago • 4 comments

  1. Ifdef fences for ibmxl around omp declare target keeps other compilers from correctly creating device functions. There should be a omp declare target around the ReduceLoc destructor as well.
  2. In the OpenMP target ReduceSum tests, the variable sum is used in one kernel, then has its value reset via reset() and then is used in another kernel. The issues with this is that (a) reset() does not set isMapped back to false before the 2nd kernel is called and (b) the test grabs the value of the reduction in the first kernel and uses in an assert. This causes cleanup() to be called which frees up the memory allocations on the host and device. Either the tests need to use new variables or reset() needs to be refactored so that the appropriate memory allocations are made. Alternatively, cleanup() would only be called when the host side destructor is called.

mcordery avatar Jan 16 '21 00:01 mcordery

  1. Those fences were there because xl didn't correctly apply implicit declare target. Where, specifically, are you running into issues with those?
  2. sounds like reset() needs to be fixed

trws avatar Jan 19 '21 19:01 trws

  1. We were hitting issues because the intel compilers don't have any 'implicit declare target' capability and the fence for the pragma is only for ibm so we basically had to throw -D__ibmxl__ in order to get it to compile. Not a big deal, just something that needed 'fixing'.
  2. poking at it at the moment but not quite sure how you guys would want to handle this going forward. Is the plan to use TargetReduce<T>.reset() only for those cases where you've already brought the data back? That would seem to be the most logical use of it. I don't see a use case for resetting the value before a kernel has been called at all. If so, it should be ok to add a bit to pass the offload data reference info to the Reduce_Data reset and then use that to copy re-initialize the device memory and copy the values back and reset the host values.

mcordery avatar Jan 19 '21 21:01 mcordery

So I have an updated version of reduce.hpp that modifies the Reduce class reset function. It did solve the ReduceSum problem that I'm seeing and it should have solve the ReduceMin/Max problem as well but they're still there. Are you guys relying on clang's -fimplicit-declare-target to map everything most of the time? I have this sneaky feeling min/max (and possibly other) functions might be missing.

mcordery avatar Jan 27 '21 21:01 mcordery

We do rely on implicit declare target. That shouldn't require the compiler flag anymore, but we do need that behavior.

trws avatar Jan 27 '21 21:01 trws