pca icon indicating copy to clipboard operation
pca copied to clipboard

OutOfMemoryError for moderately large datasets

Open ilonachan opened this issue 3 years ago • 0 comments

Problem

I wanted to use pca.js for a dataset of 30000 elements with 5 variables each. Calling getEigenVectors caused an OutOfMemoryError, so I couldn't get this to work.

Cause

In two places in the library, a unit square matrix is created with n*n elements, where n is the number of data points. For large datasets this quadratic scaling quickly exhausts memory.

Solution

In both locations where unitSquareMatrix is called, it is immediately multiplied with the data itself. This step is unnecessary and can be removed completely, solving the problem. If a preprocessing step is needed to make sure the data is actually stored in a valid matrix or something, a more efficient deep copy function can be implemented.

ilonachan avatar May 19 '22 10:05 ilonachan