LvArray Simplified user interface for tensorOps functions

The tensor ops functions typically require a single/combination of integer arguments to set the bounds for the function. Take for example:

template< std::ptrdiff_t ISIZE, typename VECTOR >
LVARRAY_HOST_DEVICE CONSTEXPR_WITHOUT_BOUNDS_CHECK inline
auto l2NormSquared( VECTOR const & vector )
{
  static_assert( ISIZE > 0, "ISIZE must be greater than zero." );
  internal::checkSizes< ISIZE >( vector );

  auto norm = vector[ 0 ] * vector[ 0 ];
  for( std::ptrdiff_t i = 1; i < ISIZE; ++i )
  {
    norm = norm + vector[ i ] * vector[ i ];
  }
  return norm;
}

The call looks something like:

l2NormSquared<3>(array)

where the N(3) is always required, and the type VECTOR is deduced. This is a little bit clunky imo. We could do something where we deduced both a size and type. Like so:

#include <tuple>
#include <iostream>
#include <type_traits>

template< typename _T > 
  struct HasMemberFunction_size_impl 
  { 
private: 
    template< typename CLASS > static constexpr auto test( int )->decltype( std::is_convertible< decltype( std::declval< CLASS >().size(  ) ), int >::value, int() ) 
    { return CLASS::size(); } 
    template< typename CLASS > static constexpr auto test( ... )->int 
    { return sizeof(CLASS)/sizeof(std::declval<CLASS>()[0]);; } 
public: 
    static constexpr int value = test< _T >( 0 );
  }; 
template< typename CLASS > 
static constexpr int HasMemberFunction_size = HasMemberFunction_size_impl< CLASS >::value;




template< int N >
struct Tensor
{
    static constexpr int size() 
    { return N; }

    double & operator[]( int const i )
    {
        return data[i];
    }

    double const & operator[]( int const i ) const
    {
        return data[i];
    }

    double data[N];
};



template< typename T, int N>
double func_impl( T const & array )
{
    double rvalue = array[0]*array[0];
    for( int i=1 ; i<N ; ++i )
    {
        rvalue += array[i]*array[i];
    }
    return rvalue;
}

template< typename T, int N = HasMemberFunction_size<T> >
typename std::enable_if<!std::is_pointer<T>::value, double>::type func( T const & array )
{
   std::cout<<"Calling reference version"<<std::endl;
   return func_impl<T,N>(array);
}

template< int N, typename T >
typename std::enable_if<std::is_pointer<T>::value, double>::type func( T const array )
{
   std::cout<<"Calling pointer version"<<std::endl;
   return func_impl<T,N>(array);
}



////////////////////////////////////////////

int main()
{
    Tensor<3> t;

    t[0] = 1;
    t[1] = 2;
    t[2] = 3;


    double a[3] = {1,2,3};
    double * const pa = a;


    std::cout<<func(t)<<std::endl;
    std::cout<<func(a)<<std::endl;
    std::cout<<func<3>(pa)<<std::endl;

}

Here it is in a compiler explorer: https://godbolt.org/z/K1P4T9qdE

If we were to standardize the way we return compile time sizes, this works fairly well. With a raw pointer you would still require the specification of N. With multi-dimensions this would be more complicated...but I don't think it is too prohibitive, and the usability is certainly nicer.

@corbett5 @klevzoff Thoughts?

Mar 22 '21 04:03 rrsettgast

I think this is cleaner, it doesn't require detection and only uses one extra function instead of two. https://godbolt.org/z/3xzMnhYc7

I'm not convinced though. If there was a way to do it without requiring the extra function to reorder N and T when deducing N I'd be into it. But there are a lot of functions in tensorOps and wrapping all of them would be a pain. Plus you'd still have to explicitly specify the sizes anytime one of the first few arguments is an Array class.

IE for matrix multiplication

template< int ISIZE=Size0< DST_MATRIX >, int JSIZE=Size1< DST_MATRIX >, int KSIZE=Size2< MATRIX_A >, ... >
void Rij_eq_AikBkj( DST_MATRIX && dstMatrix, MATRIX_A const & matrixA, MATRIX_B const & matrixB );

you only get anything if dstMatrix is a c-array and so is matrixA.

Mar 22 '21 05:03 corbett5

@corbett5 Odd...I couldn't get the overload to work when I tried it....but now it works fine. Must be my cognitive decline coming into play.

I am looking at this from a usage perspective. It is nice to not have to specify the sizes if they can be deduced. In the case of an ArraySlice/ArrayView/C-array/R1Tensor, they can be deduced. The cost is that we have to:

provide inspection capabilities for the types we want to deduce
have to maintain a function for the non-deduced types to call the "impl/deduced" function.
It is more complicated to do this for multidimensional objects. However it is straight forward as long as the Size0, Size1, ... inspection capabilities exist.
Mixing types that are deduced and non-deduced probably won't work.

Is this a fair set of statements?

Mar 22 '21 18:03 rrsettgast

ArraySlice/ArrayView cannot be deduced at compile time.

Mar 22 '21 18:03 corbett5

Duh. I guess we would have to wait until we have compile-time dims added.

Mar 22 '21 18:03 rrsettgast

Did you ever put together a proposal for compile time dims, or are we waiting for RAJA to do it first?

Mar 22 '21 18:03 rrsettgast

No, I could do it but it would take a fair bit of time. You could do it in steps though, the first of which would be to get rid of the strides.

Mar 22 '21 18:03 corbett5

If you keep the strides you get the advantage of deducing and not-having to resize and it would be much quicker. But the indexing math would be unchanged.

Mar 22 '21 18:03 corbett5

I still think it is worth doing and just adding to the deducable types when they become available. I just really dislike having to specify the dims for something that is fixed dim at compile time and can be deduced.

Mar 22 '21 19:03 rrsettgast