LvArray icon indicating copy to clipboard operation
LvArray copied to clipboard

Add tensorOps benchmarks

Open corbett5 opened this issue 4 years ago • 5 comments

Also examine using std::fma.

corbett5 avatar Jul 10 '20 04:07 corbett5

What did you have in mind for std::fma for device kernels?

rrsettgast avatar Jul 12 '20 00:07 rrsettgast

CUDA has a fma as well, just like cos and whatnot. I'm not sure it would be beneficial but worth checking out.

corbett5 avatar Jul 12 '20 01:07 corbett5

I suspect that it may force the compiler to recognize the fma operation, when it might miss it otherwise?? We are getting all sorts of DFMA instructions in our CUDA PTX, but I was pretty careful about checking that we are getting them when we expect.

rrsettgast avatar Jul 12 '20 05:07 rrsettgast

Yeah but it could be slower: https://stackoverflow.com/questions/34265982/automatically-generate-fma-instructions-in-msvc For things like AiBi it is very applicable. But how you'd go about applying it to things like

dstSymMatrix[ 3 ] = matrixA[ 1 ][ 0 ] * symMatrixB[ 0 ] * matrixA[ 2 ][ 0 ] +
                        matrixA[ 1 ][ 0 ] * symMatrixB[ 5 ] * matrixA[ 2 ][ 1 ] +
                        matrixA[ 1 ][ 0 ] * symMatrixB[ 4 ] * matrixA[ 2 ][ 2 ] +
                        matrixA[ 1 ][ 1 ] * symMatrixB[ 5 ] * matrixA[ 2 ][ 0 ] +
                        matrixA[ 1 ][ 1 ] * symMatrixB[ 1 ] * matrixA[ 2 ][ 1 ] +
                        matrixA[ 1 ][ 1 ] * symMatrixB[ 3 ] * matrixA[ 2 ][ 2 ] +
                        matrixA[ 1 ][ 2 ] * symMatrixB[ 4 ] * matrixA[ 2 ][ 0 ] +
                        matrixA[ 1 ][ 2 ] * symMatrixB[ 3 ] * matrixA[ 2 ][ 1 ] +
                        matrixA[ 1 ][ 2 ] * symMatrixB[ 2 ] * matrixA[ 2 ][ 2 ];

might harm performance even if std::fma is fast because it limits the re-arranging the compiler can do.

corbett5 avatar Jul 12 '20 05:07 corbett5

without fma I count 27 fp operations.

dstSymMatrix[ 3 ] = matrixA[ 1 ][ 0 ] * ( symMatrixB[ 0 ] * matrixA[ 2 ][ 0 ] +
                                          symMatrixB[ 5 ] * matrixA[ 2 ][ 1 ] +
                                          symMatrixB[ 4 ] * matrixA[ 2 ][ 2 ] ) +
                    matrixA[ 1 ][ 1 ] * ( symMatrixB[ 5 ] * matrixA[ 2 ][ 0 ] +
                                          symMatrixB[ 1 ] * matrixA[ 2 ][ 1 ] +
                                          symMatrixB[ 3 ] * matrixA[ 2 ][ 2 ] ) +
                    matrixA[ 1 ][ 2 ] * ( symMatrixB[ 4 ] * matrixA[ 2 ][ 0 ] +
                                          symMatrixB[ 3 ] * matrixA[ 2 ][ 1 ] +
                                          symMatrixB[ 2 ] * matrixA[ 2 ][ 2 ] );

rearranging and using fma i count 12.

rrsettgast avatar Jul 12 '20 05:07 rrsettgast