CUDA.jl
CUDA.jl copied to clipboard
Schur decomposition
Schur decomposition is implemented efficiently on GPU in PyTorch, Jax, and MatLab. It is an essential ingredient for the efficient computation of matrix square root (sqrtm).
It would be ideal to have this decomposition implemented directly in CUDA.jl. The only alternative I see at the moment is full-on diagonalization which is less efficient and less stable.