neo
neo copied to clipboard
Implement row-major CUDA matrices
The relevant operations in CUBLAS (gesv, gemm, geam) feature a CUBLAS_T flag to work with transposed matrices. This should be enough to cover matrices in row order
This would also allow to support t() in the CUDA version