monolish
monolish copied to clipboard
Implement transpose() on GPU
ref: https://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-geam
This implementation is delay. Drop from 0.14.2.
https://stackoverflow.com/questions/13782012/how-to-transpose-a-matrix-in-cuda-cublas