PolyMath icon indicating copy to clipboard operation
PolyMath copied to clipboard

PMPrincipalComponentAnalyser - Does not work if num(rows) < num(cols)

Open nikhilpinnaparaju opened this issue 5 years ago • 3 comments

Code to Reproduce Error

|a pca |
a := PMMatrix rows: #(#(-1 -1 1) #(-2 -1 2)).
pca := PMPrincipalComponentAnalyserSVD new componentsNumber: 2.
pca fit: a.
pca transformMatrix.

SubscriptOutOfBounds Error Raised

nikhilpinnaparaju avatar May 12 '19 15:05 nikhilpinnaparaju

This is specific to SVD as PMPrincipalComponentAnalyserJacobiTransformation works.

a := PMMatrix rows: #(#(-1 -1 1) #(-2 -1 2)).
pca := PMPrincipalComponentAnalyserJacobiTransformation  new componentsNumber: 2.
pca fit: a.

AtharvaKhare avatar May 14 '19 16:05 AtharvaKhare

This happens due to the following lines: https://github.com/PolyMathOrg/PolyMath/blob/8e663a8f998375e657596a33c9d3b97a530bb0f6/src/Math-PrincipalComponentAnalysis/PMPrincipalComponentAnalyserSVD.class.st#L40 https://github.com/PolyMathOrg/PolyMath/blob/8e663a8f998375e657596a33c9d3b97a530bb0f6/src/Math-Matrix/PMSingularValueDecomposition.class.st#L60-L63

I tried matching values of eigenU and eigenV with sklearn's output, eigenV does not match for n(rows) < n(cols).

u after decompose of PolyMath:

a PMVector(0.7071067811865476 -0.7071067811865475)
a PMVector(0.7071067811865475 0.7071067811865476)

u after numpy.linalg.svd:

matrix([[-0.70710678, -0.70710678],
        [-0.70710678,  0.70710678]])

v after decompose of PolyMath:

a PMVector(0.5773502691896257 0.21132486540518697 -0.7886751345948129)
a PMVector(0.5773502691896257 -0.7886751345948129 0.21132486540518725)
a PMVector(0.5773502691896258 0.5773502691896257 0.5773502691896257)

v after numpy.linalg.svd:

matrix([[-0.57735027, -0.57735027, -0.57735027],
        [-0.81649658,  0.40824829,  0.40824829],
        [ 0.        , -0.70710678,  0.70710678]])

I am not well-versed with linear algebra and theory behind SVD, but shouldn't both v match?

Edit: Tried an online calculator. u:

0.70710678118655    -0.70710678118656
-0.70710678118655  -0.70710678118655

v:

0.81649658092773     -0.40824829046386    -0.40824829046386
-0.57735026918962    -0.57735026918963     -0.57735026918962
0                    -0.70710678118655        0.70710678118655

AtharvaKhare avatar May 14 '19 16:05 AtharvaKhare

The singular values of any matrix are uniquely defined, up to order (but by convention they are ordered from largest to smallest). The singular vectors u and v are uniquely determined (up to the sign) for square matrices only, so not for this example.

However, there is something wrong with the current SVD implementation, because it fails the ultimate test for any SVD (in PMSingularValueDecompositionTest):

testReconstruction

	| svd u v s reconstructed |
		
	svd := matrix decompose.
	u := svd leftSingularForm.
	v := svd rightSingularForm.
	s := svd sForm.
	
	reconstructed := u * s * v transpose.
	self assert: reconstructed closeTo: matrix

This fails for some inputs, e.g. if you use loadExample3 instead of the default loadExample1.

khinsen avatar May 15 '19 13:05 khinsen