RAJA Document the generalized usage of ReduceMaxLoc

I have a few use cases where it would be nice to be able to get related value(s) out of a reduction. For example when computing the max absolute difference of two arrays I often want to know not only the max difference but the values where they differ. For arrays on the CPU this is easy enough to do as follows

std::array< double, 2 > maxDiff( int size, double * a, double * b )
{
  RAJA::ReduceMaxLoc< RAJA::seq_reduce >, double > maxDiff( 0, 0 );

  RAJA::forall< RAJA::loop_exec >( RAJA::RangeSegment( 0, size ), [] ( int i )
  {
    maxDiff.maxloc( std::abs( a[ i ] - b[ i ] ), i );
  } );

  int location = maxDiff.getLoc();
  return { a[ location ], b[ location ] };
}

But this won't work with a device execution unless you have unified memory (a and b are on device so a[ location ] would segfault). I'm not sure if under the hood the reduction location machinery would only work for numeric values, but if it can be made general it would be pretty nice to be able to do something like the following (although in this new case the names aren't appropriate).

std::array< double, 2 > maxDiff( int size, double * a, double * b )
{
  RAJA::ReduceMaxLoc< RAJA::cuda_reduce >, double, std::array< double, 2 > > maxDiff( 0, { 0, 0 } );

  RAJA::forall< RAJA::cuda_exec< 32 > >( RAJA::RangeSegment( 0, size ), [] __device__ ( int i )
  {
    maxDiff.maxloc( std::abs( a[ i ] - b[ i ] ), { a[ i ], b[ i ] } );
  } );

  return maxDiff.getLoc();
}

May 26 '21 01:05 corbett5

We discussed something like this in relation to what we would do for the new reduction API actually though we hadn't decided exactly how to handle it. There's no fundamental reason we couldn't allow an arbitrary second type in there (with some requirements, assignability etc.) but there are some support mechanisms in there we might have to fix. If this is something you're interested in spending a bit of time on, it should be a relatively easy refactor, and I could point you to where tweaks would need to go.

May 26 '21 13:05 trws

@corbett5 if you know the number of indices you want to get back, for example, the RAJA loc reductions support this. here's an example from one of our tests. I would think this would work with a struct of non-int types, such as doubles. It's not going to work using std::array on a CUDA device, however.

May 26 '21 15:05 rhornung67

@trws this isn't a must-have and I'm pretty bust at the moment so I don't think I'll get around to it any time soon.

@rhornung67 cool! Can you point me at the example?

May 27 '21 06:05 corbett5

@corbett5 sorry. he's a link to the example: https://github.com/LLNL/RAJA/blob/develop/test/functional/kernel/reduce-loc/tests/test-kernel-reduceloc-Max2DViewTuple.hpp

May 27 '21 15:05 rhornung67

That does the trick, very cool!

My question has been answered but I'll leave this open as a documentation issue.

May 28 '21 02:05 corbett5

@corbett5 can you send me a code snippet of your use case. I will add it to the tests and document it in the user guide.

May 28 '21 14:05 rhornung67