scikit-dimension
scikit-dimension copied to clipboard
Question: Why is d = k + 1 for Kaiser and Broken Stick?
I noticed that the ID estimates provided by the Kaiser
and broken_stick
methods in id.lPCA
are k + 1
, where k
is the number of components to be kept according to the most commonly used implementations of these rules (i.e. keep only components with an eigenvalue > 1 [Kaiser], or keep only components with greater than expected explained variance [broken stick]).
I'm wondering what the thinking was behind this choice, and if there are any papers I can cite justifying this modification.
Thanks!
I think this is non-standard indeed. It is mentioned in the docstring but might be confusing, I can make the change. There are some alternative/modified versions of Kaiser, e.g. https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008591