pca
pca copied to clipboard
OutOfMemoryError for moderately large datasets
Problem
I wanted to use pca.js for a dataset of 30000 elements with 5 variables each. Calling getEigenVectors caused an OutOfMemoryError, so I couldn't get this to work.
Cause
In two places in the library, a unit square matrix is created with n*n elements, where n is the number of data points. For large datasets this quadratic scaling quickly exhausts memory.
Solution
In both locations where unitSquareMatrix is called, it is immediately multiplied with the data itself. This step is unnecessary and can be removed completely, solving the problem. If a preprocessing step is needed to make sure the data is actually stored in a valid matrix or something, a more efficient deep copy function can be implemented.