scikit-dimension icon indicating copy to clipboard operation
scikit-dimension copied to clipboard

Incorrect implementation of Participation Ratio

Open RylanSchaeffer opened this issue 2 years ago • 2 comments

I just noticed what I think might be a problem with your implementation of Participation Ratio!

Suppose one is given a matrix X with shape (num samples, num features). The PR is given by square(sum(eigenvalues)) / sum(square(eigenvalues)) of X^T X, but the implementation uses PCA.

https://github.com/scikit-learn-contrib/scikit-dimension/blob/1ddee0f7d4bfdb9ee44ada956a5c7c55950248b0/skdim/id/_PCA.py#L149

This is incorrect because PCA first demeans the data. I don't think that's correct.

RylanSchaeffer avatar Sep 24 '22 03:09 RylanSchaeffer

I couldn't find the original reference of participation ratio, maybe you know it ? It seems usual practice to refer to PR of eigenvalues of the covariance matrix. But I can add an option to return eigenvalues without centering if you need it for your application, thanks for the suggestion!

j-bac avatar Sep 26 '22 11:09 j-bac

I also double checked, and I'm not sure either. Regardless, an optional flag would be awesome, thank you!

RylanSchaeffer avatar Sep 26 '22 17:09 RylanSchaeffer