rulinalg
rulinalg copied to clipboard
Add rectangular matrix support for FullPivLu
This is an attempt to resolve issue #161.
I'm pretty uncertain about the decisions I made in FullPivLu::solve. The main problem is that, unlike in PartialPivLu, which requires square invertible matrices, FullPivLu::solve can now be given a problem that has no solution or has infinitely many solutions. Thoughts on how I went about dealing with this? The other option I saw was to require the matrix be square and invertible for FullPivLu::solve.
This is a very interesting PR!
I'll do a thorough review later, but I want to have some high level discussion first.
I must admit I hadn't even considered implementing solve for non-square matrices. Perhaps this is a good idea. However, there are some things I don't fully understand. I'll start by explaining what I find the most useful about rectangular LU decomposition. In particular, I can imagine LU decomposition of rectangular matrices is particularly useful when:
- You need the actual factors
L,Uand the permutation matrices, so youunpackthe result. - You want to estimate the rank of a rectangular matrix in a potentially much cheaper way than the SVD (which is more accurate).
Let's consider a system Ax = b, where A is m x n and b is a vector of length m. Let's consider first the case m > n. Here we have an overdetermined system, and in the general case there can be no solution to the system. However, in some special cases, namely when b is in the column space of A, the system may still have a solution.
Let now m < n. In this case we have an underdetermined system. Here we have either no solutions or an infinite number of solutions. In any case, we still need to have b lie in the column space of A for a solution to exist.
As far as I know, one typically uses a least-squares approach to these problems. In the case when Ax = b is consistent and has a solution, then the result is obtained with a least-squares procedure. When Ax = b has no solution, one retrieves a solution x which "almost" satisfies the equations in the sense that || Ax - b || is minimal.
If one is only interested in the exact solution and one has no interest in least-squares approximations, it might be possible to use LU decomposition, although I've never seen this used for this purpose (but keep in mind I'm not at all an expert). From a brief glance, I'm intrigued by your technique. Without having carefully analyzed it, it raises some questions:
- It seems to me that this could actually work for overdetermined systems with full column rank. However, I don't think it can capture all cases when the system is rank-deficient, does it?
- I believe the same is true for undeterdetermined systems - there might exist solutions that the procedure does not see and an error is returned instead. At least I think so - please correct me if I'm wrong.
I'm open to providing a solve function that actually finds a solution to the system Ax = b for rectangular A, but in order for it to be truly useful I think it needs to satisfy the following criterion:
If a solution to the system
Ax = bexists, thensolvemust return a valid solution to the system. Conversely, ifsolvereturns a solution, it must be a proper solution to the system.
From my understanding by looking at the code, I think the proposed solve function does not quite satisfy the first requirement. In other words, it may fail to give a solution when one or more solutions to the system exist, which is unfortunate, because it cannot be relied on. Am I correct in this assumption? I'd love to hear your thoughts on this, @prismatica!
I totally agree with @andlon that if solve is to be useful, it must be able to find a solution if one exists. I think that the existing implementation will do that even in the rank-deficient case. Given a rank-deficient matrix with rank r, its LU decomposition will have a square full-rank L matrix, and a rectangular rank-deficient U matrix, with the zero vector for the last n - r rows, and non-zero entries along the diagonal for the first r rows. This means that the last n - r variables are free variables in the x vector, and can take any value in a valid solution. The first r variables are restricted based on the values taken by the free variables.
The approach taken in the PR is to set all the free variables to zero, and solve the equation using just the upper-left r x r portion of the U matrix, called for convenience U'. Since this matrix is full-rank and square, there will be one and only one solution to this restricted equation. If a solution to U x = b exists, there will be a solution with all zero free variables, and the first r variables taking the values as found by solving the U' x = b solution. Of course, this method can find solutions to U' x = b that are not solutions to A x = b, which is why the next step is to pad the found candidate with the free variables (which are all zero), and then to verify the solution by mapping through the full matrix (using FullPivLu::Mul) and comparing against the input b.
I think that this implementation meets the criteria set out by @andlon. Am I missing something?
@prismatica this makes sense to me though my opinion weighs less heavily than @Andlon and your own on this.
@prismatica: I took a closer look now, and yes, I believe you are right! You only described the underdetermined situation, however (where U is rectangular and L is square). We must also consider the case where the system is overdetermined, in which case L is rectangular and U is square. For clarity, I'll try to outline a generalized procedure below, based on your approach. Please let me know if I've made any mistakes! I think I might also have found a somewhat better way to verify the solution (?).
Problem statement
We wish to find any solution to the problem Ax = b, or return an error if no such solution exists. We consider the decomposition A = PLUQ. Without loss of generality, it is clear that we can reformulate this as LU x' = b', where x' = Q x and b' = P^T b. Hence we need only consider the problem LU x = b, where L is m x p and U is p x n.
The introduction of the size p lets us describe the problem both for the case of m > n (in which case p = n), m = n (in which case p = m = n) and m < n (in which case p = m).
Existence of solutions
We now write
LU x = [ L1 ] U x = [ L1 U x ] = [ b1 ] = b
[ L2 ] [ L2 U x ] [ b2 ]
where L1 is p x p and L2 is (m - p) x p. We note that L1 is an invertible lower triangular matrix, while L2 is just a general matrix.
Because L1 is invertible, we may write Ux = inv(L1) * b1, and we get the requirement
L2 Ux = L2 inv(L1) b1 = b2
Hence, if there exists a solution to the problem Ux = inv(L1) * b1 and the above requirement on b1 and b2 holds, then there exists a solution to the system Ax = b. Note that this solution is in general not unique unless U is invertible.
Solutions to Ux = inv(L1) * b1
We now consider the problem of a possibly rank-deficient upper trapezoidal matrix U. We let r = rank(U) and denote c = inv(L1) * b1. Because of the pivoting scheme, we can write the system in the form
Ux = [ U1 U2 ] [ x1 ] = [ c1 ] = c
[ 0 0 ] [ x2 ] [ c2 ]
where U1 is an invertible upper triangular matrix of dimensions r x r and U2 is r x (n - r). We now immediately get the following requirement for existence of a solution:
c2 == 0
If this holds, we may find any particular solution by realizing that x2 is a free variable which we may choose arbitrarily. Naturally, x2 = 0 is a sensible choice, in which case we obtain
x1 = inv(U1) * c1.
An algorithm for computing a solution
Since our proof was constructive, we may formulate an algorithm for computing any arbitrary solution to the system Ax = b if it exists, or return an error if it does not.
- Compute
cfromL1 c = b1. Ifc2 != 0, then abort (no solution exists). - If
L2 * c != b2, then abort (no solution exists). - Solve the system
U1 x1 = c1. - Form the solution
x = [x1, 0].
In practice one also needs to multiply by the appropriate permutation matrices, of course.
I suggest doing elementwise comparisons in steps 1 and 2 (in fact, we don't have to fully form the product L2 * c if we want to save allocation, but this can be a later optimization).
I'm a little worried by the overhead of verifying that the solution is actually valid. It seems to me that depending on matrix dimensions it may represent a considerable cost.
OK, this took me some time. Would greatly appreciate if someone could check my steps! I haven't seen this before, @prismatica, so I really learned something new and interesting! Thanks :-)
EDIT: Fixed minor mistakes in algorithm steps.
Thanks for the clear write-up, @andlon! I might be misunderstanding a few steps in your derivation, so I have a question:
My impression is that, to a certain extent, one can choose the size of the L and U matrices, as long as the product of the two is the size of the original matrix. In the FullPivLu decomposition, I think I've implemented it so that the L matrix will always be square and, since it is a unit lower triangular matrix, also invertible. This is because it can be viewed as the product of some number of elementary matrices (the ones that express the Gaussian elimination that results in the U matrix). Ignoring the permutation matrices for now, this is because one performs elementary row operations in a sequence to the A matrix to get U: (L1 * L2 * ... * Ln) * A = U. Then, L = inv(L1 * L2 * ... * Ln) and since elementary matrices are square and invertible, so is L. The permutation matrices just result in some new, permuted matrix A' to whom the elementary matrices are applied. Is this correct, or have I missed something?
If this can be done, I think the L2 in the derivation is empty, and so we can skip a few of the steps involving it. Is this possible?
Well, let's say that we want to have L a square matrix. Then it has to be m x m, since A is m x n. If m <= n, then L is already square by virtue of the usual decomposition (and U is rectangular). Now, consider the case m > n. Using my previous notation for rectangualar L (i.e. m x p), we see that
[ L1 ] U = [ L1 U ] = [ L1 0 ] [ U ]
[ L2 ] [ L2 U ] = [ L2 L3 ] [ 0 ]
for any matrix L3 with dimensions (m - p) x (m - p). If we choose L3 such that it is lower triangular with unit diagonal, we get your proposed L matrix, and so you are indeed correct that it is possible to obtain a decomposition L U where L is square and U rectangular. However, it is not correct to say that L2 is "empty" - it just becomes part of the larger L matrix.
Note that as our "LU matrix" that we produce during decompose (the matrix containing both the L and U factors) is rectangular, our L is in fact rectangular (with more rows than columns), so you cannot simply forward substitute directly (though depending on the implementation it is possible that forward substitute will still return the solution corresponding to L1, I haven't checked).
The approach involving a full square matrix L would potentially require a lot more work, as you'd have to solve a system of size m x m, which for m >> n can be very expensive compared to solving a system with L1 of size n x n and verifying the solution by performing a matrix-vector multiply with L2 (which is (m - n) x n).
Does this make sense? Please ask if I haven't been clear!
That does make sense, thanks!
Unfortunately, real life has caught up with me, and I won't be able to give this the time I'd like in the next few weeks. I'll try to get a new version of this that implements your changes up as soon as I can, but if it takes longer than it has in the past, I promise I haven't forgotten about you all :)
@prismatica: no worries! Please take as much time as you need. This PR is also very independent of other ongoing work in the library, so there should be no issues as a result :-)
Alright, sorry, I have another potentially stupid question. You say:
Compute c2. If c2 != 0, then abort (no solution exists).
How does one compute c2? It looks to me like c2 is the product of the zero matrix and what you've called x2. Obviously, the product of the zero matrix with anything is zero, so I must be misunderstanding what your first step is.
This most recent commit doesn't address all of @andlon's concerns. It just changes how unpack works. Instead of allocating a m x m matrix for the L portion and using the internal m x n memory for the U matrix (which doesn't make sense if m >> n), unpack now decides which matrix to store in the repurposed internal memory, and which to allocate. If m >> n, the L matrix will now be m x n, and the U matrix will be n x n, which saves memory overall.
Also, I removed the Mul trait at @AtheMathmo's suggestion.
I'm still trying to understand @Andlon's algorithm, though it's going slowly (my own fault, of course!)
Thank you all for your patience!
@prismatica: there are no stupid questions here, so please don't apologize! And if there were, I'd surely be the champion of foolish questions under normal circumstances :-)
Sorry, I was not so explicit with regards to the meaning of some of the quantities I introduced in the above description. We define c = inv(L1) * b1, where b1 corresponds to the first p elements in the b vector. We now define c2 to be the last p - r elements of c (and c1 to be the first r elements). Hence, we must compute c by solving the system L1 * c = b1 and taking the last p - r elements to get c2. (in the above, r is the rank of U)
There is/was actually a typo in the original "algorithm" that I presented: we need to compute L2 * c and check that it's approximately equal to b2, not L2 * c1 as it said at first.
It looks to me like
c2is the product of the zero matrix and what you've calledx2.
I understand your confusion. This is not it's definition, but the requirement that c2 needs to fulfill. In other words, c2 is defined as I stated above, but for a solution to exist for the overdetermined system, c2 needs to be precisely zero! Hence we need to check if c2 is actually zero to see if the overdetermined system has a solution. Of course, we have to take into account roundoff errors from finite precision arithmetic, which complicates matters.
I hope this helps! Please don't hesitate to ask more questions. I think we're all learning from this :-)
I've updated the code to what I think is the algorithm listed above, but I think I must be missing something. For example, let's take the case of
Ax = b
A = [1 2 3; 1 2 3]
b = [1; 2]
Then,
L = [1 0; 1 1]
U = [1 2 3; 0 0 0]
Then, since L is already pxp, L1 = L, so c == [1;2].
I think this means that L2 is 0x0.
Since the rank is 1, U1 = [1], so [1] * c = [1;2], meaning c == [1;2]. However, A * [1;2] != b.
Is there an additional verification step required, such as the explicit map through like in the original implementation, or have I missed something again?
It may be that there are mistakes in the algorithm I proposed, but I think it works for this particular example at least! Let's go through it step by step.
- Compute
cfromL1 * c = b1. As you correctly point out, we haveL1 = Land alsob1 = b, so we just need to solveL * c = b. However, note thatc != b! That means we need to solve the system:
[ 1 0 ] c = [ 1 ]
[ 1 1 ] [ 2 ]
=> c = [ 1 ]
[ 1 ]
Since U has rank r = 1, we have that c1 = [ 1 ] and c2 = [ 1 ] != 0. Per step 1 in the above algorithm, this means that the system has no solution and we abort the procedure.
To see that the system actually has no solution, let's write it out with variables. The system Ax = b is the same as:
x1 + 2 * x2 + 3 * x3 = 1
x1 + 2 * x2 + 3 * x3 = 2
If we make x1, x2 and x3 satisfy the first row, then they cannot simultaneously satisfy the second row! Hence the system in question has no solution.
I went through this a bit quickly, so it's very possible I've made mistakes, but I hope it should be correct. I hope this helps. Please don't hesitate to ask more questions!
OK, I finally figured it out. My apologies for all the back-and-forth and delays, and thanks for your help and patience! :)
Please let me know if there's something else I should change.