array-api icon indicating copy to clipboard operation
array-api copied to clipboard

WIP: Add specifications for computing eigenvalues and eigenvectors

Open kgryte opened this issue 4 years ago • 8 comments

This PR is currently a WIP.

This PR

  • specifies interfaces for computing eigenvalues and eigenvectors.
  • is derived from comparing signatures across array libraries.

Notes

  • This PR is blocked by lack of complex number support. Real-valued matrices may have complex-valued eigenvalues, so limiting inputs to real-valued arrays is not sufficient. Torch currently returns an nx2 eigenvalue matrix, where each row consists of a real and an imaginary component. This is a possibility, but would seem more prudent to hold off until complex number support is included in the spec.

  • In contrast to NumPy's UPLO keyword pattern (following LAPACK), this PR follows Torch and uses an upper keyword to have better consistency with other linalg spec'd functions.

  • Open question whether a separate eigvals API is needed in addition to eig. For example, in SciPy, can set left and right to False to only return eigenvalues. Presumably something similar could be done here to consolidate into a single API.

    • Answer: having dedicated APIs deemed better in order to avoid polymorphic output and to simplify API surfaces.
  • Open question whether, like SciPy, the spec should support returning either left or right or both eigenvectors. Supporting one or both is supported outside of the PyData ecosystem (e.g., in LAPACK and MATLAB).

    • Answer: same as previous answer.

kgryte avatar Jan 14 '21 11:01 kgryte

Complex number support was raised in #102. I should have added eigensolvers being another big application there! 😞 Thanks for raising this need again, @kgryte.

leofang avatar Jan 14 '21 19:01 leofang

One thought following the resolution https://github.com/data-apis/array-api/pull/105 could be that we still accept this PR to define the signatures clearly, but say that the actual standardization will come along with the complex number support in the next version of Array API. This way the array libraries like CuPy which already support complex number can work out the support very quickly.

leofang avatar Jan 14 '21 19:01 leofang

One thing interesting (which I mentioned in today's meeting) is that ROCm currently does not provide an eigensolver for general Hermitian matrices, only for tridiagonal matrices: https://rocsolver.readthedocs.io/en/latest/userguide_api.html#eigensolvers.

leofang avatar Jan 14 '21 20:01 leofang

eigh has landed for PyTorch, with UPLO not upper: https://pytorch.org/docs/master/linalg.html#torch.linalg.eigh. linalg.eig is still going to take a little while though.

rgommers avatar Jan 23 '21 20:01 rgommers

One thought following the resolution #105 could be that we still accept this PR to define the signatures clearly, but say that the actual standardization will come along with the complex number support in the next version of Array API. This way the array libraries like CuPy which already support complex number can work out the support very quickly.

Yes, working out the signatures that we expect to add to the next version makes sense to me.

rgommers avatar Jan 23 '21 20:01 rgommers

@rgommers I am guessing that Torch's decision to use UPLO is for 1:1 NumPy compatibility? Not clear to me otherwise why this would be desirable over upper (or, in general, a keyword argument which accepts a boolean value).

kgryte avatar Jan 25 '21 17:01 kgryte

@rgommers I am guessing that Torch's decision to use UPLO is for 1:1 NumPy compatibility? Not clear to me otherwise why this would be desirable over upper (or, in general, a keyword argument which accepts a boolean value).

Yes indeed, just for compat. API design wise UPLO makes no sense at all.

rgommers avatar Jan 26 '21 05:01 rgommers

I have been doing the annual linalg check of SciPy overhaul and currently linalg.eig is underway.

It seems to me that uplo or lower=True is a LAPACK plumbing issue rather than a high-level user choice. In other words, to be able to use half of the array at the Python level is quite a rarity in my limited experience so far (CPU compute, dense arrays, <32gb size). I think it is much better to get it out of the way and let the users use low level LAPACK wrappers themselves directly for such usage.

At the face value this might seem like a regression but as I recently convinced myself about it, it really doesn't add anything interesting because the array is already allocated. In fact the rectangular packing or other ways of allocating the array is way more economical instead of forcing eig or eigh to serve every use case in every library. Hence I'd definitely recommend checking out the options for low level 1D storage of half triangle data should there be a need for this.

Similarly, eigvals and eigvalsh is, I think, just API clutter. eig and eigh is in my opinion just good enough with already left=False default. For the common usage eig(..., right=False) is not really a mind-bender or finger-twister to turn off the eigvector computation.

We have always been looking at these functions as the thin-wrappers of the LAPACK routines but I think we should clearly separate the UX of Python funcs eig, eigh, svd from the low-level plumbing that we the maintainers should be doing regardless. And I think we can do a lot in terms of usability and convenience. The more important issue is the unfortunate intermediate array creations for incompatible stride/memory layout arrays etc. But that is a whole another story.

I'm curious to what others think about this.

ilayn avatar Feb 18 '22 10:02 ilayn

@kgryte I assume this PR has a successor and so can be closed?

leofang avatar Dec 21 '22 01:12 leofang

@leofang Yeah, we should probably just close this one out and create a new PR for the relevant APIs, rather than try to retrofit the current PR.

kgryte avatar Dec 21 '22 01:12 kgryte