nalgebra
nalgebra copied to clipboard
gemm_tr, gemm_ad do not use matrixmultiply
It seems that gemm_tr and gemm_ad currently do not leverage matrixmultiply for larger matrices. I noticed when I was profiling after making some changes to some of my performance-sensitive code. In fact, pre-computing let a_t = a.transpose() and calling gemm(1.0, &a_t, &b, 1.0) was significantly faster.
Hi!
That's right, they don't use matrixmultiply currently. I suppose we could make gemm_tr work with matrixmultiply by adjusting the row and col strides accordingly. The gemm_ad method on the other hand can't use matrixmultiply (except for f32 and f64 matrices in which case this is equivalent to gemm_tr) because it does not support complex numbers.
Ah, I see. I had totally forgotten that matrixmultiply does not support complex numbers, and I moreover did not know that it doesn't native support transposition. Thanks for explaining!
I assume nothing has happened on this issue? I'd be willing to try modifying gemm_tr to use matrixmultiply if nobody else wants to work on it. The performance difference is substantial, and also quite surprising to someone who doesn't know about it. Perhaps a warning in the documentation for the tr_mul and gemm_tr methods would be appropriate until this is fixed?