LvArray
LvArray copied to clipboard
Simplified user interface for tensorOps functions
The tensor ops functions typically require a single/combination of integer arguments to set the bounds for the function. Take for example:
template< std::ptrdiff_t ISIZE, typename VECTOR >
LVARRAY_HOST_DEVICE CONSTEXPR_WITHOUT_BOUNDS_CHECK inline
auto l2NormSquared( VECTOR const & vector )
{
static_assert( ISIZE > 0, "ISIZE must be greater than zero." );
internal::checkSizes< ISIZE >( vector );
auto norm = vector[ 0 ] * vector[ 0 ];
for( std::ptrdiff_t i = 1; i < ISIZE; ++i )
{
norm = norm + vector[ i ] * vector[ i ];
}
return norm;
}
The call looks something like:
l2NormSquared<3>(array)
where the N
(3
) is always required, and the type VECTOR
is deduced. This is a little bit clunky imo. We could do something where we deduced both a size and type. Like so:
#include <tuple>
#include <iostream>
#include <type_traits>
template< typename _T >
struct HasMemberFunction_size_impl
{
private:
template< typename CLASS > static constexpr auto test( int )->decltype( std::is_convertible< decltype( std::declval< CLASS >().size( ) ), int >::value, int() )
{ return CLASS::size(); }
template< typename CLASS > static constexpr auto test( ... )->int
{ return sizeof(CLASS)/sizeof(std::declval<CLASS>()[0]);; }
public:
static constexpr int value = test< _T >( 0 );
};
template< typename CLASS >
static constexpr int HasMemberFunction_size = HasMemberFunction_size_impl< CLASS >::value;
template< int N >
struct Tensor
{
static constexpr int size()
{ return N; }
double & operator[]( int const i )
{
return data[i];
}
double const & operator[]( int const i ) const
{
return data[i];
}
double data[N];
};
template< typename T, int N>
double func_impl( T const & array )
{
double rvalue = array[0]*array[0];
for( int i=1 ; i<N ; ++i )
{
rvalue += array[i]*array[i];
}
return rvalue;
}
template< typename T, int N = HasMemberFunction_size<T> >
typename std::enable_if<!std::is_pointer<T>::value, double>::type func( T const & array )
{
std::cout<<"Calling reference version"<<std::endl;
return func_impl<T,N>(array);
}
template< int N, typename T >
typename std::enable_if<std::is_pointer<T>::value, double>::type func( T const array )
{
std::cout<<"Calling pointer version"<<std::endl;
return func_impl<T,N>(array);
}
////////////////////////////////////////////
int main()
{
Tensor<3> t;
t[0] = 1;
t[1] = 2;
t[2] = 3;
double a[3] = {1,2,3};
double * const pa = a;
std::cout<<func(t)<<std::endl;
std::cout<<func(a)<<std::endl;
std::cout<<func<3>(pa)<<std::endl;
}
Here it is in a compiler explorer: https://godbolt.org/z/K1P4T9qdE
If we were to standardize the way we return compile time sizes, this works fairly well. With a raw pointer you would still require the specification of N
. With multi-dimensions this would be more complicated...but I don't think it is too prohibitive, and the usability is certainly nicer.
@corbett5 @klevzoff Thoughts?
I think this is cleaner, it doesn't require detection and only uses one extra function instead of two. https://godbolt.org/z/3xzMnhYc7
I'm not convinced though. If there was a way to do it without requiring the extra function to reorder N
and T
when deducing N
I'd be into it. But there are a lot of functions in tensorOps
and wrapping all of them would be a pain. Plus you'd still have to explicitly specify the sizes anytime one of the first few arguments is an Array
class.
IE for matrix multiplication
template< int ISIZE=Size0< DST_MATRIX >, int JSIZE=Size1< DST_MATRIX >, int KSIZE=Size2< MATRIX_A >, ... >
void Rij_eq_AikBkj( DST_MATRIX && dstMatrix, MATRIX_A const & matrixA, MATRIX_B const & matrixB );
you only get anything if dstMatrix
is a c-array and so is matrixA
.
@corbett5 Odd...I couldn't get the overload to work when I tried it....but now it works fine. Must be my cognitive decline coming into play.
I am looking at this from a usage perspective. It is nice to not have to specify the sizes if they can be deduced. In the case of an ArraySlice/ArrayView/C-array/R1Tensor, they can be deduced. The cost is that we have to:
- provide inspection capabilities for the types we want to deduce
- have to maintain a function for the non-deduced types to call the "impl/deduced" function.
- It is more complicated to do this for multidimensional objects. However it is straight forward as long as the
Size0, Size1, ...
inspection capabilities exist. - Mixing types that are deduced and non-deduced probably won't work.
Is this a fair set of statements?
ArraySlice/ArrayView cannot be deduced at compile time.
Duh. I guess we would have to wait until we have compile-time dims added.
Did you ever put together a proposal for compile time dims, or are we waiting for RAJA to do it first?
No, I could do it but it would take a fair bit of time. You could do it in steps though, the first of which would be to get rid of the strides.
If you keep the strides you get the advantage of deducing and not-having to resize and it would be much quicker. But the indexing math would be unchanged.
I still think it is worth doing and just adding to the deducable types when they become available. I just really dislike having to specify the dims for something that is fixed dim at compile time and can be deduced.