tensorly
tensorly copied to clipboard
Randomised_CP function throws a Singular Matrix error
Steps or Code to Reproduce
import tensorly as tl
import ivy
tl.set_backend("numpy")
a = tl.ones((2,3,4))
z = tl.decomposition.randomised_parafac(a, 2, 1, n_iter_max=1, init="svd", max_stagnation=1, tol=0.03125, random_state=1)
print(z)
Error
Traceback (most recent call last):
File "/workspaces/ivy/test2.py", line 7, in <module>
z = tl.decomposition.randomised_parafac(a, 2, 1, n_iter_max=1, init="svd", max_stagnation=1, tol=0.03125, random_state=1)
File "/opt/miniconda/envs/multienv/lib/python3.10/site-packages/tensorly/decomposition/_cp.py", line 726, in randomised_parafac
factor = tl.transpose(tl.solve(pseudo_inverse, factor))
File "/opt/miniconda/envs/multienv/lib/python3.10/site-packages/tensorly/backend/__init__.py", line 206, in wrapped_backend_method
return getattr(
File "/opt/fw/mxnet/numpy/linalg/linalg.py", line 409, in solve
r = gufunc(a, b, signature=signature, extobj=extobj)
File "/opt/fw/mxnet/numpy/linalg/linalg.py", line 112, in _raise_linalgerror_singular
raise LinAlgError("Singular matrix")
numpy.linalg.LinAlgError: Singular matrix
The snippet runs successfully when rank==1, however fails for all other values. Is there a bound on the value of rank for a given tensor?
Versions
numpy version : 1.25.2 -->
Is there a bound on the value of rank for a given tensor?
Hi! Here I am offering you my findings on this problem: The parameter R you referred to is the number of components, i.e. rank-one tensors, you would like to have in the result of your CP decomposition. In the given reference of randomised_parafac()
, (Casey Battaglino, Grey Ballard and Tamara G. Kolda, "A Practical Randomized CP Tensor Decomposition"), section 3.1, there was a discussion on the smallest number of rank-one tensors in an exact CP decomposition, i.e. the tensor rank. Here I am citing a few relevant points:
- There is no straightforward algorithm to determine the rank of a specific tensor. In fact, the problem is NP-hard.
- In practice, the rank of a tensor is determined numerically by fitting various rank-R CP models. (In section 3.4) Ideally, if the data is noise-free, and we have a procedure for calculating CP with a given number of components, then we can do that for R=1,2,3… and stop at the first value of R that gives a fit of 100%. However, there are many problems with this procedure. … When the data are noisy (as is frequently the case), then fit alone cannot determine the rank in any case; …
- ALS, today's workhorse method to compute CP at a given R, can take many iterations to converge, and is not guaranteed to converge to the global minimum or even a stationary point. The author also summarized typical ranks of three-way tensors over R space with or without frontal slice symmetry.
(Btw, I checked the arXiv paper again today and could not find the tensor rank section any more. Not sure why but I am trying to find the version I read a few days ago.)
Regarding the singular matrix error, specifically
factor = tl.transpose(tl.solve(pseudo_inverse, factor))
In numpy/linalg/linalg.py
def solve(a, b):
"""
Solve a linear matrix equation, or system of linear scalar equations.
Computes the "exact" solution, `x`, of the well-determined, i.e., full
rank, linear matrix equation `ax = b`.
…
"""
It seems that the singular matrix error occurs when a
in ax=b
is not invertible, which in this case means pseudo_inverse
is not invertible. We confirm this by inspecting the determinant of pseudo_inverse
:
############## START
R = 2
############## ITERATE
======== iteration 0
------------- mode 0
det of pseudo_inverse: 4.791666666666669
------------- mode 1
det of pseudo_inverse: 4.397817528932697e-29
------------- mode 2
det of pseudo_inverse: 0.0
Traceback (most recent call last):
...
numpy.linalg.LinAlgError: Singular matrix
Inspecting the calculation steps:
711 kr_prod, indices_list = sample_khatri_rao(
712 factors, n_samples, skip_matrix=mode, random_state=rng
713 )
…
724 pseudo_inverse = tl.dot(tl.transpose(kr_prod), kr_prod)
There can be multiple causes for det(pseudo_inverse) = 0
, analytical or numerical.
Analytically, kr-prod
is the sampled khatri-rao product of size (n_samples, R), and pseudo_inverse = tl.dot(tl.transpose(kr_prod), kr_prod)
is of size (R, R). If we want pseudo_inverse
to be invertible, kr_prod
should have R non-zero eigenvalues.
Numerically, one possible reason is that column normalization of the updated factors is currently not implemented in randomised_parafac()
. I'm not sure why this step is skipped.
chaoyihu is right about what the issues is here. I'll elaborate a bit further.
The issue is not with the choice of rank, but rather the choice of number of samples (n_samples
) which is the 3rd parameter in randomised_parafac
. In the ALS algorithm for CP decomposition, each step involves solving a highly overdetermined linear least squares problem, i.e., something of the form $\min_X || AX - B ||$ where $A$ is tall and skinny. What randomised_parafac
does is draw a subset of the rows in this linear system uniformly at random. The number of rows that are drawn is determined by n_samples
. When this is set to 1 as in your example, only a single row is drawn. When $R=1$, this doesn't break the code since you end up with a $1 \times 1$ linear system which is (typically) invertible. However, for $R > 1$, you'll end up with an underdetermined linear system.
To see why this is a problem, suppose $S$ is the sampling operator. Then, the sampled least squares system is $\min_X || SAX - SB||$. As implemented, randomised_parafac
solves this via the normal equations, i.e., by reformulating it to the problem of solving $(SA)^\top (SA) X = (SA)^\top B$ for $X$. As chaoyihu points out, this system is solved by numpy's solve
function which can only handle full-rank problems, i.e., when $(SA)^\top (SA)$ is of full rank. If n_samples
< rank
, this matrix will always be rank-deficient and solve
won't work.
In practice, since the sampling is done uniformly at random, even if n_samples
$\geq$ rank
, there's a chance that the system $(SA)^\top (SA)$ is rank deficient. It is therefore a good idea to choose n_samples
to be quite a bit bigger than rank
.
As a side note, randomised_parafac
doesn't actually implement any of the methods by Battaglino et al. That paper uses randomized trigonometric transforms to first mix the rows before sampling. The way randomised_parafac
is implemented currently, it's easy to construct an adversarial example which breaks the method (which would give the kind of singular matrix error you noticed) with high likelihood even when a large number (say, half) of all rows are included in the linear least squares problems.
@OsmanMalik thanks for the comment! Could you elaborate on the issue with the randomized parafac? Would be awesome if you are able to add a fix! :)
@OsmanMalik thanks for the comment! Could you elaborate on the issue with the randomized parafac? Would be awesome if you are able to add a fix! :)
@JeanKossaifi Suppose for example that the design matrix $A$ is tall-and-skinny with the following properties: $a_{11} = 1$, $a_{1j} =0$ for $j > 1$, $a_{i1} = 0$ for $i > 1$, the other entries are i.i.d. Gaussian (or anything that makes the submatrix $(a_{ij})_{i>1, j>1}$ full rank). Then, $A$ is full-rank. Any sampling operator that fails to draw the top row of $A$ will result in a rank-deficient sampled design matrix (first column of $SA$ will be zero), and hence in general the solution to the sampled linear system will be inaccurate. To ensure we include the top row with probability at least, say, 0.5 we need to draw at least half of all rows of $A$ when doing uniform sampling.
That's how the construction of the adversarial example goes. Of course, in practice things usually end up being a lot nicer than an adversarial example, so uniform sampling should work adequately and perform more closely to a sampler with theoretical guarantees like those based on leverage score sampling or fast trig transforms.
Would love to add a fix, but have limited time these days. Will take a stab if I find the time!