scikit-dimension icon indicating copy to clipboard operation
scikit-dimension copied to clipboard

Question: Why is d = k + 1 for Kaiser and Broken Stick?

Open danlurie opened this issue 2 years ago • 1 comments

I noticed that the ID estimates provided by the Kaiser and broken_stick methods in id.lPCA are k + 1, where k is the number of components to be kept according to the most commonly used implementations of these rules (i.e. keep only components with an eigenvalue > 1 [Kaiser], or keep only components with greater than expected explained variance [broken stick]).

I'm wondering what the thinking was behind this choice, and if there are any papers I can cite justifying this modification.

Thanks!

danlurie avatar Mar 01 '22 01:03 danlurie

I think this is non-standard indeed. It is mentioned in the docstring but might be confusing, I can make the change. There are some alternative/modified versions of Kaiser, e.g. https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008591

j-bac avatar Apr 01 '22 09:04 j-bac