scikit-matter icon indicating copy to clipboard operation
scikit-matter copied to clipboard

From docs it is not super clear that sample selection works analogously to feature selection

Open agoscinski opened this issue 1 year ago • 2 comments

We have even in the examples a section Feature and Sample Selection, but no example notebook. https://scikit-matter.readthedocs.io/en/latest/tutorials.html

agoscinski avatar Mar 08 '23 16:03 agoscinski

Not too sure what exactly you mean by this. In the API-reference for Feature and Sample Selection it states that:

"scikit-matter contains multiple data sub-selection modules, primarily corresponding to methods derived from CUR matrix decomposition and Farthest Point Sampling. In their classical form, CUR and FPS determine a data subset that maximizes the variance (CUR) or distribution (FPS) of the features or samples. These methods can be modified to combine supervised and unsupervised learning, in a formulation denoted PCov-CUR and PCov-FPS. For further reading, refer to [Imbalzano2018] and [Cersonsky2021].

These selectors can be used for both feature and sample selection, with similar instantiations. Currently, all sub-selection methods extend GreedySelector, where at each iteration the model scores each feature or sample (without an estimator) and chooses that with the maximum score."

https://scikit-matter.readthedocs.io/en/latest/selection.html

victorprincipe avatar Mar 08 '23 16:03 victorprincipe

this is the current tutorials page scikit-matter-sample-selection-tutorial-page

I agree that it is written the API, but we had a user who wasn't sure from the examples how to use sample selection. So we can improve this, but changing an example or adding one.

agoscinski avatar Mar 09 '23 11:03 agoscinski