Distributions.jl icon indicating copy to clipboard operation
Distributions.jl copied to clipboard

Adding normalized invariant measure for semi-orthogonal matrices

Open sethaxen opened this issue 2 years ago • 9 comments

Semi-orthogonal matrices are real matrices X of shape n, k for n >= k, such that X'X==I. The matrices live on the Stiefel manifold, which is compact, so that one can define a uniform distribution on it. The distribution that is usually defined is the normalized invariant measure, so-called because it is invariant to left- and right- transformations of X by an orthogonal matrix. When k=n, the unnormalized form is called the Haar measure, and when k=1, this is the usual volume measure on the sphere.

I propose adding a UniformSemiOrthogonal distribution. The main reason to add such a distribution is for sampling purposes. One can easily generate a matrix from the normalized invariant measure with the code Matrix(qr(randn(n, k)).Q). More efficient approaches are possible, but this is the simplest one.

Densities on the Stiefel manifold are generally written with respect to either the invariant measure or its normalized form. I believe VonMisesFisher writes the density wrt the normalized volume measure on the sphere, so it makes more sense to use the normalized invariant measure as the base measure, in which case the log-density would be 0.

Just like we have LKJCholesky, it would be useful to add an AbstractQVariate and an UniformSemiOrthogonalQ{AbstractQ} <: Distributions{AbstractQVariate}. Instead of generating a dense matrix, instead this would generate a Q in an factorized form, which can operate efficiently without ever forming a dense matrix.

sethaxen avatar Nov 05 '23 19:11 sethaxen

Code for efficiently sampling a QRPackedQ can be found at https://gist.github.com/sethaxen/4d5a6d22e56794a90ece6224b8100f08. It also supports complex matrices, and for square matrices, it also supports constraining the determinant to be 1.

qr(::Matrix) computes a QRCompactWY object; using the factorization it wraps supports more level 3 BLAS operations and can thus be more efficient. If the user constructed UniformSemiOrthogonal{QRCompactWYQ}, then the packed factorization could be converted to this compact one using the methods in https://github.com/JuliaLang/LinearAlgebra.jl/issues/1302.

The questions to answer are:

  • do we want to add another VariateForm?
  • AbstractQ objects are somehow simultaneously non-square and square (though primarily square). e.g. the invariant measure on size (n, k) semi-orthogonal matrices for n>=k could be stored in an AbstractQ representing a size (n, n) orthogonal matrix, but only the first k columns would actually follow the invariant measure; the matrix obtained by computing also the remaining n-k columns would not follow the invariant measure on size (n, n) orthogonal matrices. Could that be a problem?

sethaxen avatar Apr 23 '25 12:04 sethaxen

Note that in order to get a uniform distribution you need to eliminate the sign ambiguity of the QR decomposition. That is, you need to do Q, R = qr(randn(n,n)); Q*Diagonal(sign.(diag(R))). I implemented efficient functions to do this here, see random_unitary and random_isometry.

In order to handle this extra multiplication by a diagonal matrix I had to create a custom type extending QRPackedQ. I could only get this to work for square matrices, though, because as you noticed AbstractQ objects are simultaneously square and non-square, and have inconsistent multiplication rules, see JuliaLang/LinearAlgebra.jl#1172

araujoms avatar Apr 24 '25 19:04 araujoms

Note that in order to get a uniform distribution you need to eliminate the sign ambiguity of the QR decomposition.

Yes this is true and is handled by the code linked in https://github.com/JuliaStats/Distributions.jl/issues/1795#issuecomment-2824139709.

In order to handle this extra multiplication by a diagonal matrix I had to create a custom type extending QRPackedQ.

The LAPACK subroutine xGEQRFP can generate the QR decomposition with R with a positive diagonal directly, producing factors with the same structure as expected by QRPackedQ. But really the only difference between this and xGEQRF (which is the same algorithm that LinearAlgebra uses to produce a QRPackedQ) is that xGEQRFP uses the subroutine xLARFG to compute the Householder reflectors. The downstream application of the reflectors uses the exact same subroutines. The code I linked to uses a Julia implementation of xLARFG so that we can build a QR whose R has a positive diagonal and need no special AbstractQ type. For more info, see LAPACK working note 203.

I could only get this to work for square matrices, though, because as you noticed AbstractQ objects are simultaneously square and non-square, and have inconsistent multiplication rules, see JuliaLang/LinearAlgebra.jl#1172

I understand the reason for the inconsistency in size, but I wish LinearAlgebra had taken the view that Q is non-square but could be wrapped in another constructor e.g. OrthogonalQ(Q) to interpret it as square. At this point I'm considering writing a tiny single-function package that just defines its own size function with overloads for all AbstractQ objects defined in the ecosystem, so that packages like Distributions can check what size the original Q was.

sethaxen avatar Apr 24 '25 21:04 sethaxen

That's fascinating, I thought it wasn't possible to do compute a numerically stable reflector with non-negative β because of the division by α - β. Do you mind if I copy your code? I'd like to simplify mine and do away with the custom type.

In any case, wouldn't this distribution be a better fit for RandomMatrices? I thought distributions over matrices were supposed to go there, not here. It already has an implementation of Stewart's algorithm. That package in general is in need of a lot of love.

araujoms avatar Apr 25 '25 09:04 araujoms

That's fascinating, I thought it wasn't possible to do compute a numerically stable reflector with non-negative β because of the division by α - β.

Yeah figuring out how to do that is the main contribution of the working note (see Section 2). In the end it's pretty straightforward.

Do you mind if I copy your code? I'd like to simplify mine and do away with the custom type.

Not at all. My gist doesn't have a license, but would be nice if you linked to it as the original source. Some extra bonuses compared to the Stewart's algorithm in RandomMatrices are:

  1. works for other types than BlasFloats
  2. avoids unnecessarily generating the strict superdiagonal, which has no impact on the resulting reflectors.
  3. avoids applying the reflectors during construction of the factors. This works at any step in the algorithm, when we go to generate reflectors for column $j$, columns $j+1\ldots k$ are standard normally distributed. The action of the reflector on these subsequent columns leaves them standard normally distributed, so instead of generating column $j+1$ at iteration 0 and applying the reflectors of columns $1 \ldots j$ to them, it is sufficient to generate column $j+1$ after computing the reflector of column $j$.
  4. supports the Haar measure on the special orthogonal matrices

In any case, wouldn't this distribution be a better fit for RandomMatrices? I thought distributions over matrices were supposed to go there, not here.

I don't know about that. This package's docs do point once to RandomMatrices, but no issues indicate matrix distributions belong there. Several popular matrix distributions exist here, and more work is planned (see e.g. #1939). I don't know what the motivation was behind RandomMatrices, but it seems to have a number of dependencies that wouldn't be necessary here.

It already has an implementation of Stewart's algorithm. That package in general is in need of a lot of love.

Yeah browsing through the code it doesn't seem to be well-maintained and from juliahub stats not widely used. I would hesitate to use it as a dependency. Also, Haar violates Distributions' API.

I still tend to think that it would be good to get an AbstractQVariate here. There's precedent for distributions of factorized forms of matrices to be in here (see CholeskyVariate and LKJCholesky). But this isn't incompatible with first making improvements to RandomMatrices.

sethaxen avatar Apr 25 '25 13:04 sethaxen

Thanks a lot, it's already committed: https://github.com/dev-ket/Ket.jl/commit/24bf99c12d3b90c94b3da6d0b7a28196a6cbc6ba

If random matrices do belong here I think RandomMatrices shouldn't be improved, instead it should be deprecated so that people don't use a zombie package.

araujoms avatar Apr 25 '25 17:04 araujoms

Why do you do

_get_log2_safmin(::Type{T}) where {T<:Real} = exponent(floatmin(T) / eps(T))
safmin_exponent = _get_log2_safmin(T)
safmin = exp2(T(safmin_exponent))
invsafmin = exp2(-T(safmin_exponent))

instead of

_get_safmin(::Type{T}) where {T<:Real} = floatmin(T) / eps(T)
safmin = _get_safmin(T)
invsafmin = inv(safmin)

?

araujoms avatar Apr 28 '25 10:04 araujoms

I honestly don't remember; it's been a long time. Perhaps I was trying to increase the value of safmin slightly to decrease the chances of invsafmin overflowing, but then I would want

_get_log2_safmin(::Type{T}) where {T<:Real} = exponent(floatmin(T) / eps(T)) + 1

sethaxen avatar Apr 28 '25 10:04 sethaxen

Indeed, that would make more sense. floatmin(T) / eps(T) is always a power of two, so the simpler version is equivalent. I ran into trouble with your version because exponent doesn't work for DoubleFloats and MultiFloats.

araujoms avatar Apr 28 '25 11:04 araujoms